Closed maartenbreddels closed 4 years ago
Hi @maartenbreddels, thanks for checking out the project!
At the moment, this limitation is by design because the px.histogram
function maps directly to the plotly.js histogram
trace type, which does all of the binning on the JavaScript side.
I think it would be nice to have some kind of server_side
option to perform the binning in Python. In that case, we would display the results using a bar
trace. If we implemented all of the bin function options, we might be able to make this the default.
@nicolaskruchten interested in your thoughts on this when you're back in the office
Hi Jon!
ok, good to know, so that would only be the exception I assume?
cheers,
Maarten
Hi both, sorry for the delay in responding.
I agree that for this kind of thing, leveraging "server-side" Python is a clear winner. My philosophy with px
initially had been that it would do as little work as possible server-side, so as to provide a more coherent wrapper around plotly.js, but this kind of thing is an obvious limitation of this approach. The downside of implementing a server_side
flag to the aggregating trace types like px.histogram
but also px.density_heatmap
, px.density_contours
and even px.box
possibly, is that we have to implement in Python something which is ideally identical to the behaviour of the underlying JS library, which is a challenge, but also that it's not clear what trace type to use in the output. If we do server-side aggregation in px.histogram
, do we then produce a figure composed of bar
trace types? That seems a bit weird...
Migrated over to the main plotly
repo: https://github.com/plotly/plotly.py/issues/2649
Hi,
great project and it was on my wishlist to try it out with vaex (an out of core dataframe alternative to pandas), which I've done here: https://github.com/vaexio/vaex/pull/383
However, when I try the histogram:
I notice that plotly asks vaex for that data (150 million rows), and adds the data to the plotly Histogram object. Trying to send that to the browser will fail (it will crash chrome). I was expecting plotly express to do a groupby (which vaex then will handle instead of pandas), and only send the aggregated data. Is this a bug or a feature, and is this likely to change?
Regards,
Maarten