Closed williamhogman closed 1 year ago
I had a problem with the previous version of this code, and as I see it, it's still here. before chunking each document , each document should be mapped to inject the baseparameters, not after. Since they can affect the size of the document.
I could take a look at all of this tonight
@i1i1 is taking a look as well :)
ah even better :)
As a reference then to my problem https://github.com/sobelio/llm-chain/blob/afaf1e335a4c0d016a2f5809a429c9a5f9492fd0/llm-chain/src/chains/map_reduce.rs#L82 (base parameters aren't sent in)
So far document splitter is okay and seems like the issue might be related to openai text splitter tokenizer: https://github.com/sobelio/llm-chain/blob/main/llm-chain-openai/src/chatgpt/text_splitter.rs#L29
I'm not good at that, but I think the issue is related to wrong tokenizer
Found what is the issue. We do not account for prompt tokens here: https://github.com/sobelio/llm-chain/blob/afaf1e335a4c0d016a2f5809a429c9a5f9492fd0/llm-chain/src/frame.rs#L47
@i1i1 Nice find! So basically the problem is that the map reduce doesn't count the Document + the prompt? :)
Yes!
Do you want to try to fix it? We can help you on Discord if you run into problems :)
MapReduce chains might need to stop using frame because of this.
Seems that the context window is not properly cut down