Open ahuang11 opened 10 months ago
I think we've already discussed that in the past. IIRC it's pretty difficult to stop things from running. I think we discussed removing columns from by
and groupby
that have too many unique values.
I was imagining this https://github.com/holoviz/panel/pull/5962
+1. I am trying to build a custom data catalogue inspired by Intake. I would like my users to be able to explore a source using the Explorer
but if I select a datetime column in the the by
or groupby
then suddenly everything blocks for a long time and even crashes the client or server sometimes.
It would be very helpful with one or more of the below.
I think it could be possible to implement support for some limits to not include columns with .nunique>N
in the by
or groupby
column.
The explorer is pretty different from the chat components of Panel, it doesn't fetch data from the internet, which is indeed the kind of operation that can get very slow (e.g. rated limited API) and you may want to stop. Ideally the explorer should not have slow and blocking operations, as you can't explore data when things get too slow :) !
Sure we could try to add a cancel button to stop processing but before jumping on that kind of engineer solution I'd like to look into potential UX solutions. Like:
Agree with @maximlt, the solution is not to allow a user to back out of some nonsensical selection that will, at best, cause lengthy processing in Python or at worst crash your browser, but rather prevent users from making such selections in the first place. Indeed in many cases there is no backing out, often the actual Python portion of the processing finishes very quickly but once Bokeh is asked to render the output it effectively freezes the browser, at which point it's too late. I too suggest we do some of the following:
I agree that we should try to limit to risk of a user selecting something that blocks.
But I also believe that it would mean a lot of the plot could be generated in its own thread by default. As it is now the explorer will not really work in a shared application because it will be blocking the main thread over and over again for 0.1-5 secs.
Maybe Panel should provide its own implementation of asyncify
to run something async/ in its own thread in one line of code.
Very doubtful that threading hvPlot would meaningfully unlock the GIL.
Sometimes, accidentally click on a column name with over 1000 unique groups for
by
and then it takes forever unless I Cmd+C or restart kernel.