langchain4j / langchain4j

Java version of LangChain
https://docs.langchain4j.dev
Apache License 2.0
4.78k stars 942 forks source link

MarkdownHeaderTextSplitter #574

Open andyflury opened 9 months ago

andyflury commented 9 months ago

langchain4j already offers several DocumentSplitters. One that is currently missing is MarkdownHeaderTextSplitter.

The original (python based) langchain project has such a MarkdownHeaderTextSplitter

Would be nice to have this as well in langchain4j.

Attached an implementation for this. I used ChatGPT to translate the Python code to java (incl. the Unit Test). It might not conform to all the coding standards of the project, but it does the job and tests pass.

MarkdownHeaderTextSplitter.zip

langchain4j commented 9 months ago

@andyflury thank you!

mike-adonis commented 9 months ago

Hey @langchain4j can i pick this up ?

langchain4j commented 9 months ago

@mike-adonis sure, go ahead! Thank you!

langchain4j commented 8 months ago

@mike-adonis did you manage to start/implement it?

mike-adonis commented 8 months ago

Yes I did start, just been busy with a few things I can try to complete it by the end of the weekend

mike-adonis commented 8 months ago

@langchain4j please find the pr for this #690