mwouts / itables

Pandas DataFrames as Interactive DataTables
https://mwouts.github.io/itables/
MIT License
765 stars 56 forks source link

Inf loading tables for particular `polars` data frame #291

Closed jmakov closed 3 months ago

jmakov commented 3 months ago

I have a data frame with 300 columns and 1000 rows. When trying to display the data frame, it says "Loading ITables v2.1.1 from the init_notebook_mode". The first thing after imports I do is itables.init_notebook_mode(all_interactive=True) and can display any other DF normally. Not sure how to debug the problem.

on46zohu commented 3 months ago

Do you mean it shows nothing after the message "Loading ITables v2.1.1 from the init_notebook_mode" ? If that is the case, it is because your dataframe is larger in size than itables can handle. For example, I face the same for the tables larger than 15 MB (not the file size, but the table size).

jmakov commented 3 months ago

Yes, that's what it shows. Not sure that the size is the probl. since the DF of equal size before a bunch of transformations shows w/o a problem. Also lf.collect().sample(100) results in same behavior and lf.collect().sample(100).estimated_size("kb") results in 3.515625. So I guess this particular DF is somehow problematic.

mwouts commented 3 months ago

Hi @jmakov , thanks for reporting this.

Possibly indeed that's the size of the HTML table that is a problem. What browser are you using? Would you have a simple reproducible example for how to generate a similar large table?

Also, did you change maxBytes? You can run into this kind of problem if you deactivate the downsampling and display large tables (as your notebook/HTML page becomes unusually large).

jmakov commented 3 months ago

FF 127.0. I'm not sure how to reproduce it without burning a lot of time. I have a lazy frame lf1 of shape (434881, 283) that can be displayed without a problem, but after I do some operations on it to get lf2, the shape the same (424882, 298) but can't display it. I did try to use maxBytes=6_400_000_000 with the same result (although a lot longer processing time). So even if I do lf.collect().sample(10) which produces a DF of shape (10, 298), the problem remains.

edit: seems to be a probl connected with the polars data frame as .to_pandas() displays the results as expected

mwouts commented 3 months ago

I see. I am also aware of another issue at the moment with polars: #290 , but the symptoms are different (the table displays object rather than the actual data). Not sure whether that's related, but could you please try the following code:

from itables.javascript import datatables_rows

datatables_rows(df)

and check whether the output (a two-dimensional array) looks "sane" on your problematic DF? By "sane" I mean that it should contain only simple types (ints, floats, str) that can be transferred more or less verbatim to javascript.

Or, if you're familiar with html, you could use to_html_datatable and inspect the resulting html file:

from itables import to_html_datatable

with open('problematic_table.html', 'w') as fp:
    fp.write(to_html_datatable(df))

Please let me know what you find!

jmakov commented 3 months ago

I'm only using polars.UInt64, Int64 or Float32, no complicated/nested structs.

from itables import javascript

javascript.datatables_rows(lf.collect().sample(100))
# '[[BigInt("1666767918216000000"), 1699300000000, 0.BigInt("9510565400123596"), -0.BigInt("30901700258255005") ...

lf.dytpes
# [UInt64, Int64, Float32, Float32, .. (and 300 other Float32 cols)
mwouts commented 3 months ago

Thanks @jmakov , that's very helpful. I have been able to reproduce the problem and I think I have a fix. Would you mind to try the PR above? Detailed instructions will come with the PR. Thanks!

jmakov commented 3 months ago

Sure, I can try if I have instructions how to install/build the thing :), no problem!

jmakov commented 3 months ago

@mwouts tried and it works. Thank you for the quick fix!

mwouts commented 3 months ago

Awesome, thank you for keeping me posted! I'll release that fix then.