lamalab-org / matextract-book

http://matextract.pub/
MIT License
25 stars 1 forks source link

Add some comments about LLM chunking #158

Closed MrtinoRG closed 3 weeks ago

MrtinoRG commented 3 weeks ago

Used here for example: https://arxiv.org/pdf/2410.03341

They just prompt a model to chunk a text to a certain number of tokens. To ensure that the output is consistent with this number, they fix the context length of the output to the number of tokens they desire the chunk to have

MrtinoRG commented 3 weeks ago

One will need either funds for the API calls or some computing power, but it can represent an easy solution for chunking

kjappelbaum commented 3 weeks ago

lol, again "just ask the LLM to do it"