samuelcolvin / rtoml

A fast TOML library for python implemented in rust.
https://pypi.org/project/rtoml/
MIT License
324 stars 28 forks source link

Benchmark against pytomlpp and tomli #24

Closed hukkin closed 3 years ago

hukkin commented 3 years ago

Added pytomlpp and Tomli to the benchmark. I re-ran the benchmark in a clean venv, using CPython 3.8.5 on a Intel i5-8350U CPU @ 1.70GHz cpu.

codecov[bot] commented 3 years ago

Codecov Report

Merging #24 (7849a53) into main (a06824d) will not change coverage. The diff coverage is n/a.

@@            Coverage Diff            @@
##              main       #24   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines           44        44           
  Branches         7         7           
=========================================
  Hits            44        44           
samuelcolvin commented 3 years ago

Looks good, I'll run myself when I'm back at my computer, then merge.

samuelcolvin commented 3 years ago

Thanks so much, this is great.

Now you've made me spend half a day trying to make rtoml faster by removing parse_datetime needing to be implemented in python. :smile: :angry:

hukkin commented 3 years ago

Haha, is the goal to beat the C++ implementation? :smile:

samuelcolvin commented 3 years ago

hard not to try when the difference is so small.

samuelcolvin commented 3 years ago

see #25, rtoml is now around 8% faster than pytomlpp.

I'll wait to update the benchmarks until I've release a new version an can use the manylinux binary for the benchmark which is fairer.

hukkin commented 3 years ago

Awesome congrats! Now waiting for pytomlpp to make their next move, haha

samuelcolvin commented 3 years ago

v0.7 released and benchmarks updated, see ccd2c01a210bcd3b7cf0be8dbbdedab352476e32

hukkin commented 3 years ago

Booo, for making us others look bad. :smile:

Btw I was toying around with different sorts of data a few days ago and discovered that rtoml performance suffers quite a lot with IIRC slightly larger TOML files (around 1000 lines).

I think the data I used was generated with something like

import uuid
for _ in range(500): 
    print(f"[{uuid.uuid4()}.a.b]") 
    print("key='val'")

Just an FYI. Maybe you're interested to see if there's an easy improvement that would fix that.

samuelcolvin commented 3 years ago

Interesting I think the problem could be at:

https://github.com/samuelcolvin/rtoml/blob/ccd2c01a210bcd3b7cf0be8dbbdedab352476e32/src/lib.rs#L23

Really it would be better to create an object at the right size initially rather than dynamically change its size.

I'll try and see if there's a way to fix this with pyo3.