Open abf7d opened 1 year ago
You probably have some outliers in your data. And, in Vaex, the histogram bins are half open [min, max)
. A dirty way to include the last value in the last bin is to do. limits=[[xmin, xmax+eps], [ymin, ymax+eps], ...]
where eps=1e-10, or ideally (1e-16/(xmin-xmax). Does that make sense?
I think I understand. Let me clarify: So by half open, do you mean that, for the max value, the bins go up to but don't include the last point? I should add eps
caculation to my max values to include the max point?
Also should that value be be (1e-16/(xmax - xmin)) or (1e-16/(xmin - xmax))?
Yes, and yes :) and yes!
Thank you so much. I tried the formula provided and it looks like for one of my axes eps
is too small. It gets rounded off. When I tried eps=1e-10
it works. Again, I appreciate you pointing me in the right direction and your quick response!
Description I'm trying to bin a two dimensional histogram using the df.count method. I wish for the histogram to be bound inside the min/max points for each axis. In other words I want a histogram to stretch out over the whole chart. I'm expecting to get a histogram that has at least one non-zero bin in every edge row or column. The problem is I get back histograms that have multiple contiguous zero rows or columns on the border.
How do I generate a histogram of two columns where each edge contains the bounding min or max value for the row / column?
Here is an example of a histogram that I generated which is not bound by non-zero bins along the edges. The top, bottom, and right edges of this histogram have a lot of empty area:
The bin values match what is rendering in the chart:.
In my code, I first get the limits:
then I get and return the bins:
Software information
Additional information Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).