mmcdermott / EventStreamGPT

Dataset and modelling infrastructure for modelling "event streams": sequences of continuous time, multivariate events with complex internal dependencies.
https://eventstreamml.readthedocs.io/en/latest/
MIT License
102 stars 16 forks source link

Option of using pyarrow when writing parquet #49

Closed juancq closed 1 year ago

juancq commented 1 year ago

I got a big performance boost (85% faster execution of write_parquet calls on my dataset when running build_dataset script) by using pyarrow. It may have to do with all the strings in my dataset.

 df.write_parquet(fp, use_pyarrow=True)

It would be helpful to have a configuration options for setting the parameters of various polars functions.