Open orpiske opened 1 month ago
IMHO, this work should end up being part of lanchain4j and we can eventually use is as one of the tokenize strategy in Apache Camel
I don't think it's something that should go in Camel. Camel is an integration framework, tokenizing is a feature related to something else.
IMHO, this work should end up being part of lanchain4j and we can eventually use is as one of the tokenize strategy in Apache Camel
Yeah.
I also don't see it as being part of camel, as rightly pointed by @oscerd. It could be used by it, though.
So, I think a reasonable approach would be to create a Java library and then work to include support for it on langchain4j and Quarkus.
I would then move this discussion to the langchain4j issue tacker so they may provide some additional info/suggestion as they may have had the chance to think about it already
For reference, here's a discussion with the Langchain4j project. Their suggestion is to look at the DocumentSplitter
interface and work on top of that.
We need to investigate chunking strategies that can help the assistant provide better answers: