sobelio / llm-chain

`llm-chain` is a powerful rust crate for building chains in large language models allowing you to summarise text and complete complex tasks
https://llm-chain.xyz
MIT License
1.31k stars 128 forks source link

Fix problem with map-reduce tutorial #110

Closed williamhogman closed 1 year ago

williamhogman commented 1 year ago

Seems that the context window is not properly cut down

williamhogman commented 1 year ago

Either here: https://github.com/sobelio/llm-chain/blob/main/llm-chain/src/chains/map_reduce.rs#L122

or here: https://github.com/sobelio/llm-chain/blob/main/llm-chain/src/chains/map_reduce.rs#L155

Juzov commented 1 year ago

I had a problem with the previous version of this code, and as I see it, it's still here. before chunking each document , each document should be mapped to inject the baseparameters, not after. Since they can affect the size of the document.

I could take a look at all of this tonight

williamhogman commented 1 year ago

@i1i1 is taking a look as well :)

Juzov commented 1 year ago

ah even better :)

Juzov commented 1 year ago

As a reference then to my problem https://github.com/sobelio/llm-chain/blob/afaf1e335a4c0d016a2f5809a429c9a5f9492fd0/llm-chain/src/chains/map_reduce.rs#L82 (base parameters aren't sent in)

i1i1 commented 1 year ago

So far document splitter is okay and seems like the issue might be related to openai text splitter tokenizer: https://github.com/sobelio/llm-chain/blob/main/llm-chain-openai/src/chatgpt/text_splitter.rs#L29

I'm not good at that, but I think the issue is related to wrong tokenizer

i1i1 commented 1 year ago

Found what is the issue. We do not account for prompt tokens here: https://github.com/sobelio/llm-chain/blob/afaf1e335a4c0d016a2f5809a429c9a5f9492fd0/llm-chain/src/frame.rs#L47

williamhogman commented 1 year ago

@i1i1 Nice find! So basically the problem is that the map reduce doesn't count the Document + the prompt? :)

i1i1 commented 1 year ago

Yes!

williamhogman commented 1 year ago

Do you want to try to fix it? We can help you on Discord if you run into problems :)

MapReduce chains might need to stop using frame because of this.