johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
8.91k stars 215 forks source link

JSON parser uses lots of memory and takes twice the time of `jq` #1276

Open tooptoop4 opened 1 year ago

tooptoop4 commented 1 year ago

for 100mb file (10k records)

this takes 2mins and uses 14gb memory: mlr --ijson count <file>

while jq length command takes 1min and uses 5gb memory

tooptoop4 commented 1 year ago

i found https://github.com/TeskaLabs/cysimdjson to be best for just counting records and https://github.com/python-rapidjson/python-rapidjson for accessing all the fields