New Features - Githubissues

Thanks visarga for the suggestion:

summary, for better topic embedding
named entities, for knowledge base
implicit tasks present in the text, what are the tasks a LLM could learn from a given example?
chain-of-thought augmentation, to bring out implicit deductions; it has been shown in the Phi-1.5 paper and Orca that synthetic CoT datasets are superior source materials

togethercomputer / RedPajama-Data