mwouts / itables

Pandas DataFrames as Interactive DataTables
https://mwouts.github.io/itables/
MIT License
740 stars 55 forks source link

Support downsampling only rows, not columns? #84

Closed ofir-reich closed 2 years ago

ofir-reich commented 2 years ago

I find that downsampling rows makes a lot of sense, while downsampling columns almost always makes me rerun the line and downsample myself. Is there a way to support this natively? I would argue that the default should be to downsample only rows and not columns, but even if it's not - it would be great if it's configurable somehow, so that I can make that be the behavior.

Thanks!

mwouts commented 2 years ago

Hi @ofir-reich , are you suggesting that column downsampling should occur only when the number of columns exceeds max_columns and not when the number of bytes is too large? I guess the change to downsample.py should be fairly easy - would you like to give it a try on your end?

mwouts commented 2 years ago

Hi @ofir-reich , I have a tentative implementation that downsamples rows when the table has more rows than columns, and columns when the table has more columns than rows. Would you like to give it a try and report how it feels? You can install it with

pip install git+https://github.com/mwouts/itables.git@smart_downsampling
mwouts commented 2 years ago

For instance, the example in the docs, which was downsampled 500x7 to 250x3: image

Is now downsampled 500x7 to 178x7: image

mwouts commented 2 years ago

Hi @ofir-reich , I think this is a good addition to the library, so I have integrated the PR and the new downsampling approach will be available in itables==1.1.2.

ofir-reich commented 2 years ago

Wow, thanks! Didn't even get a chance to test it!