scikit-hep / histbook

Versatile, high-performance histogram toolkit for Numpy.
BSD 3-Clause "New" or "Revised" License
109 stars 9 forks source link

Confusing exception #25

Closed imandr closed 6 years ago

imandr commented 6 years ago

I know maybe I am doing something wrong here:

h = Hist(bin("a", 10, 0, 1), groupby("tt"))
h.fill(a=np.random.random((100,)), tt=["a"]*100)
h.fill(a=np.random.random((100,)), tt=["b"]*100)
h.area("a").to(canvas)

Proper way would probably be

h = Hist(bin("a", 10, 0, 1), groupby("tt"))
h.fill(a=np.random.random((100,)), tt=["a"]*100)
h.fill(a=np.random.random((100,)), tt=["b"]*100)
h.stack("tt").area("a").to(canvas)

but the error message I get running first fragment is very puzzling:

...

/Users/ivm/anaconda/lib/python2.7/site-packages/histbook-1.0.9-py2.7.egg/histbook/proj.pyc in handlearray(content)
    504 
    505         def handlearray(content):
--> 506             content = content.reshape((-1, self._shape[-1]))
    507 
    508             out = numpy.zeros((content.shape[0], len(columns)), dtype=content.dtype)

AttributeError: 'list' object has no attribute 'reshape'
jpivarski commented 6 years ago

Okay. Somehow a list gets into the histogram's content, which is something wrong far upstream of this error.

But now I see more of what you're trying to do and it's against the grain of how histbook works. You want

ha = Hist(bin("a", 10, 0, 1), fill=np.random.random((100,)))
hb = Hist(bin("a", 10, 0, 1), fill=np.random.random((100,)))

# one histogram with a new categorical axis
h = Hist.group(a=ha, b=hb)

# now you can use that categorical axis in plotting
h.stack("tt").area("a").to(canvas)

If your dataset actually had a string field in it, you'd create the categorical axis from those strings— but it doesn't. What you have is more like two Monte Carlo samples, which would come from different datasets, possibly even on different computers. They're distinguished by where you find the data, not by a categorical field in the data, so you want to make two histograms and bring them together.

Tomorrow, I'll be able to spend some time on histbook and knock out all or most of these issues.

imandr commented 6 years ago

In fact yes, categorical field is exactly what I wanted. And initially I got the plotting wrong because I omitted stack("tt"). I know that now. But the error message was misleading, it took me some debugging to figure this out and then I discovered that I just need to add stack("tt"), and it all works fine.

jpivarski commented 6 years ago

Yes, the error message indicates that _content has acquired type list, which it should never have. Contents must be dict and ndarray only. This is a bug and the error message is misleading because the bug is far upstream of the actual error.

jpivarski commented 6 years ago

Now that I'm at a laptop, I can't reproduce the error reported in the first message. In 1.0.9 (enhancements branch), the contents are correct (dict containing ndarray) and it plots correctly. Maybe this is what you found later, when you got it to work on your remote workers.

So stacking Monte Carlo samples can be done both ways, by filling histograms separately and then grouping them or by filling one histogram with a piecewise constant categorical field. I believe that the former is more "natural," but it would be a problem if both didn't work.