mlederbauer / glossagen

create a glossary out of your manuscript in materials and chemistry – instantly
https://mlederbauer.github.io/glossagen
MIT License
11 stars 1 forks source link

Feat/25 uniform chunking #29

Closed mlederbauer closed 3 months ago

mlederbauer commented 3 months ago

Implemented the parameter chunk_size in the glossary generator, that is also logged to wandb. That way, we get a more coherent parameter compared to “the 100/nth percent of the paper per chunk”, and make it easier to chunk large documents. Also logged the number of generated glossary entries, where we nicely see that with a larger chunk size, the number of entries decreases.