kindly / flatterer

Opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. Flattens JSON fast.
https://flatterer.opendata.coop
MIT License
180 stars 7 forks source link

Consider using rust_xlsxwriter instead of xlsxwriter #67

Open jpmckinney opened 4 months ago

jpmckinney commented 4 months ago

https://github.com/jmcnamara/rust_xlsxwriter by the same author as the Python original and the C library.

This project currently users a wrapper around that C library.

I'm experiencing hard-to-debug errors like:

python: third_party/libxlsxwriter/third_party/dtoa/emyg_dtoa.c:446: emyg_dtoa: Assertion `!(sizeof (value) == sizeof (float) ? __isnanf (value) : sizeof (value) == sizeof (double) ? __isnan (value) : __isnanl (value))' failed.

The pure Rust version might be better (or, at least, might be easier to debug).

kindly commented 4 months ago

@jpmckinney It looks like it currently does not support low memory mode, which I think is a requirement for large files. See: https://github.com/jmcnamara/rust_xlsxwriter/issues/1 Once that is implemented I think this would be good move, as compiling the c version really complicates creating the python wheels. To be honest this has been the most painful part of flatterer.

I am wondering if there should be an option to use the python version of libxlsxwriter, in this python wrapper (converting from CSV with the field information). This would be slower but much easier to debug.

jpmckinney commented 4 months ago

Aha, good to know. In our case, we can't really go slower – we have lots of big files that already use a lot of CPU.

jpmckinney commented 4 months ago

I found a way to reproduce the issue, so I'll open a new issue to keep it separate.https://github.com/kindly/flatterer/issues/68