larsgw / citation.js

Citation.js converts formats like BibTeX, Wikidata JSON and ContentMine JSON to CSL-JSON to convert to other formats like APA, Vancouver and back to BibTeX.
https://citation.js.org/
MIT License
219 stars 30 forks source link

File size limit? #208

Closed disastrid closed 3 years ago

disastrid commented 3 years ago

When using the CLI and trying to parse a huge archive (25k lines, about 1.6k items, ~2.2MB) I found that the parsing doesn't complete and therefore the JSON output isn't generated, it just hangs. If I break the file up into much smaller chunks (~500 lines) I can get it to work but I can't seem find anything in the docs about file size limits. This would be really useful to know.

Thanks!

larsgw commented 3 years ago

There isn't a hard file size limit so this is more a limitation of the current implementation I guess. What kind of file are you parsing?

larsgw commented 3 years ago

Confirmed now, thanks. Just for my own reference, the first step of parsing does work fine (expectedly, as it has been tested repeatedly on a ~5.2MB file), but either parsing values or translating the Bib(La)TeX schema to CSL-JSON goes wrong.

larsgw commented 3 years ago

Ah, I've found the problem (https://github.com/citation-js/citation-js/issues/114). It'll be fixed in the next release.


I don't know if you're interested in the technicalities behind this but I profiled the memory usage and one surprising thing is that a 21 kb abstract takes up 300 kb, at least for some time. This leads to an 11 mb array to take up 44 mb. This was not the main problem however.

Screenshot_20210117_133119

larsgw commented 3 years ago

This issue should be fixed in v0.5.0-alpha.10. If not, please re-open the issue.