stan-dev / cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
BSD 3-Clause "New" or "Revised" License
149 stars 67 forks source link

JSON Encoding of data is too slow #733

Closed csmarchbanks closed 4 months ago

csmarchbanks commented 5 months ago

Summary:

In version 1.0.8 of cmdstanpy ujson support was removed causing significant CPU usage and time spent serializing data to JSON. It would be nice if the CPU time spent writing data to disk was reduced back to the 5% goal from https://github.com/stan-dev/cmdstanpy/issues/38.

Description:

We were profiling the performance of a workload using cmdstanpy and saw that 20+% of CPU for a production workload is spent serializing data to JSON via write_stan_json.

Additional Information:

There is excellent background information in https://github.com/stan-dev/cmdstanpy/issues/38 on the performance of writing JSON, as well as in https://github.com/stan-dev/cmdstanpy/pull/623 for why ujson was removed.

Current Version:

cmdstanpy = 1.1.0, but was introduced in 1.0.8.

WardBrian commented 5 months ago

Can you provide some (possibly fake) data which I can use for benchmarking when I look into this?

Partially the issue is that as Stan has gained new features (complex numbers, tuples, etc), the overhead for serialization has increased since these need bespoke representations in JSON

WardBrian commented 4 months ago

@csmarchbanks could you try running python -m pip install stanio==0.5.0 ujson (ignore any warnings about version incompatibilities for now) and see if you notice any improvement for your case?

csmarchbanks commented 4 months ago

Will do, I will look into if I have a good dataset that can be shared as well!

Thank you for the quick response.

WardBrian commented 4 months ago

You may also need cmdstanpy 1.2, I don't think 1.1 was depending on the stanio package yet

csmarchbanks commented 4 months ago

Initial results show ~65% reduction in the amount of time spent in write_stan_json! I will try to check again in the workload we saw taking 20% of time once the update is deployed, this workload only took 10% (now 3.5%).

WardBrian commented 4 months ago

Great @csmarchbanks! If the 20% workload doesn't see similar improvements it would be helpful to understand what kind/shape of data is being used to further optimize.

My changes so far were mostly around allowing ujson optionally and then some optimizations for large, homogeneous numpy arrays.

csmarchbanks commented 4 months ago

The next environment shows ~5% of CPU time spent in writing files now, so significantly improved as well. As discussed in the previous issue that seems like a reasonable enough value. Thanks again and I will close this!