kislyuk / yq

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
https://kislyuk.github.io/yq/
Apache License 2.0
2.53k stars 81 forks source link

TOML support (tomlkit) is very slow #184

Closed intelfx closed 3 months ago

intelfx commented 4 months ago

Parsing a ~700 KiB TOML file with tomlq takes 20s on my ~4GHz laptop:

$ curl -fsSL 'https://static.rust-lang.org/dist/channel-rust-nightly.toml' | wc -c | bscalc -H  
753.85 KiB

$ time curl -fsSL 'https://static.rust-lang.org/dist/channel-rust-nightly.toml' | tomlq >/dev/null
curl -fsSL 'https://static.rust-lang.org/dist/channel-rust-nightly.toml'  0,02s user 0,00s system 3% cpu 0,645 total
tomlq > /dev/null  21,56s user 0,08s system 97% cpu 22,256 total

Perhaps consider a different toml library if style-preserving features (i. e. roundtrip output) are not required?

kislyuk commented 4 months ago

Thanks for letting me know.

I'm beginning to sour on TOML as a file format. I think it's not a great file format. I'm curious if your use case involves writing toml or just extracting values from it? I wonder if I should just discontinue support for writing TOML (toml -t) and only support reading. That would make the choice of library obvious - https://docs.python.org/3/library/tomllib.html which is approximately 70 times faster on my system.

intelfx commented 4 months ago

I'm curious if your use case involves writing toml or just extracting values from it?

No, just extracting the values — actually, the example I gave above is my use-case.

I'm beginning to sour on TOML as a file format. I think it's not a great file format.

In fact, I very much share your opinion :-) Unfortunately, the Rust ecosystem uses it quite widely (not to say of Python's very own PEP 517 and descendants), and I need to integrate certain processes with it, so here we are.

That would make the choice of library obvious - https://docs.python.org/3/library/tomllib.html which is approximately 70 times faster on my system.

Yes, but it should be noted that this module is 3.11+. So there likely has to be some sort of a fallback at least for the time being.

kislyuk commented 3 months ago

I addressed this by using tomllib when available.