Closed ofir-reich closed 2 years ago
Hi @ofir-reich , are you suggesting that column downsampling should occur only when the number of columns exceeds max_columns
and not when the number of bytes is too large? I guess the change to downsample.py
should be fairly easy - would you like to give it a try on your end?
Hi @ofir-reich , I have a tentative implementation that downsamples rows when the table has more rows than columns, and columns when the table has more columns than rows. Would you like to give it a try and report how it feels? You can install it with
pip install git+https://github.com/mwouts/itables.git@smart_downsampling
For instance, the example in the docs, which was downsampled 500x7 to 250x3:
Is now downsampled 500x7 to 178x7:
Hi @ofir-reich , I think this is a good addition to the library, so I have integrated the PR and the new downsampling approach will be available in itables==1.1.2
.
Wow, thanks! Didn't even get a chance to test it!
I find that downsampling rows makes a lot of sense, while downsampling columns almost always makes me rerun the line and downsample myself. Is there a way to support this natively? I would argue that the default should be to downsample only rows and not columns, but even if it's not - it would be great if it's configurable somehow, so that I can make that be the behavior.
Thanks!