The cohere summarize API has a limit of 100K characters. This is fine for most articles, but we want to be able to handle large texts as well.
Some solutions to explore:
Tokenize into sentences and do a "center crop" - assumes that least information-dense parts of an article are the ends of it, which isn't always the case.
Chunk text into sub-100k chunks, summarize them separately, then weave them back together. This can be costly for large texts and if done should warn the user. Chunking could also be done on other boundaries, like sections or chapters.
The cohere summarize API has a limit of 100K characters. This is fine for most articles, but we want to be able to handle large texts as well.
Some solutions to explore: