Open softwaredoug opened 1 month ago
For now I am creating a stub implementation of "unique" that returns the current - nonunique - array as a hacky workaround, as unique is practically unsupported for my case.
Thanks for this - tracking internally at b/345484881
Describe the current behavior A clear and concise explanation of what is currently happening.
I'm using a Pandas extension array called SearchArray. When I create a column of this type, then try to have Colab display the dataframe (as in just typing the dataframe bare and expecting HTML output in a notebook) - colab takes upwords of 6-7 minutes to execute.
What seems to be happening is colab is trying to call
_summarize_dataframe
and callingnunique
/unique
which visits every value. For extension arrays, its not safe assumption you can easily visit every value (in my case the data is in an inverted index, and uninverting is costly, and can be done on a few rows, but hard on the entire DF)Describe the expected behavior
Only visit the rows to be displayed for serialization to string. Perhaps fallback to just _replhtml if possible for extension types.
What web browser you are using (Chrome, Firefox, Safari, etc.)
Chrome
Additional context Link to a minimal, public, self-contained notebook that reproduces this issue.
Notebook https://colab.research.google.com/drive/12B5K2Kb4o8djZQV54afRjPPSd9vTiNNs?authuser=1#scrollTo=H6mWXxGhxNFg