thouis / numpy-trac-migration

numpy Trac to github issues migration
2 stars 3 forks source link

Histograms (1d, 2d, nd) (migrated from Trac #189) #1742

Closed thouis closed 11 years ago

thouis commented 11 years ago

Original ticket http://projects.scipy.org/numpy/ticket/189 Reported 2006-07-19 by atmention:huard, assigned to atmention:teoliphant.

This patch provides two new functions: histogram1d and histogramdd. The goal of histogram1d is to replace histogram eventually. It computes a one dimensional histogram column wise or row wise depending on the axis keyword. It is slighly different from histogram in that it return the bin edges, instead of only the leftmost ones. Also, values outside the bin edges are not counted.

histogramdd computes a d dimensional histogram from an NxD array or a sequence of D arrays.

Tests for both functions are included.

The patch also removes the dependence on histogram from histogram2d and corrects a bug.

thouis commented 11 years ago

Attachment in Trac by atmention:huard, 2006-07-19: histograms.patch

thouis commented 11 years ago

Comment in Trac by atmention:teoliphant, 2006-07-20

Error corrections accepted.

I don't like having both histogram and histogram1d, so I'm holding off on the other patches.

Perhaps these functions belong in SciPy anyway.

thouis commented 11 years ago

Comment in Trac by atmention:huard, 2006-08-09

histogram2d is still buggy. I'll fix it this week and add a range keyword.

I suggest that histogram is moved in the compatibility module and replaced by histogram1d (renamed to histogram). I suggested it to the list but got no feedback, I don't know if this silence means acceptance or reject.

Since histogram2d is a special case of histogramdd, it could simply be replaced by the latter (which is also buggy, I'll fix it too during the week.)

thouis commented 11 years ago

Attachment in Trac by atmention:huard, 2006-08-11: histogram2d.patch

thouis commented 11 years ago

Attachment in Trac by atmention:huard, 2006-08-11: histogramdd.py

thouis commented 11 years ago

Comment in Trac by atmention:huard, 2006-08-11

I'll work on the histogram1d function to add support for weight data and provide an extensive unit test. Then I'll submit to the list an official proposal for the switch.

thouis commented 11 years ago

Comment in Trac by atmention:teoliphant, 2006-09-14

histogram2d patched

histogramdd added as histogramnd

thouis commented 11 years ago

Comment in Trac by atmention:teoliphant, 2006-09-14

histogram1d has an axis keyword but only works for 2 dimensions. It should work for an N-dimensional array.

Moving histogram to a compatibility module is more problematic. Histogram has always placed out-of range values in the upper and lower bins. Changing this behavior will create issues for some people.

This needs more discussion.

thouis commented 11 years ago

Comment in Trac by atmention:huard, 2006-10-23

Replying to [comment:5 oliphant]:

histogram1d has an axis keyword but only works for 2 dimensions. It should work for an N-dimensional array.

Done

Moving histogram to a compatibility module is more problematic. Histogram has always placed out-of range values in the upper and lower bins. Changing this behavior will create issues for some people.

Currently, only upper out-of-range values are stored, lower outliers are not counted at all. Somebody on the list suggested to return a dictionary with upper and lower outliers. It is done. I tried to minimize code breakage by keeping two return values (hist, dict) and keeping the order of calling arguments, so that someone calling {{{histogram(arr, 20)[0]}}} won't see any difference.

However, here is what will break:

  1. '''Second return value'''[[BR]] Instead of returning (hist_array, left_edges), histogram now return (hist_array, dict). The dict contains {'edges':the bin edges (N+1), 'upper': upper outliers, 'lower': lower outliers, 'bincenters': the bin centers (N).
  2. '''Explicit ranget'''[[BR]] Outliers are not included in the histogram array, but stored in the dict. This is consequential only if range or bins is given explicitely. Indeed, if no range or bins is given, the range is (min, max) so there are no outliers.

Here are the additions:

  1. Support for weighted samples.
  2. Axis argument to compute 1D histogram along a given axis.

Concerns expressed on the list were that statistical functions should be put in scipy, or in a numpy statistical module (Tim Hochberg). What do you prefer ? I'll submit a patch accordingly.

thouis commented 11 years ago

Comment in Trac by atmention:huard, 2006-10-24

  1. Support for weighted samples.

I changed the behavior of the function so that if normed=True, the weights are not normalized. This seemed like a useful feature.

thouis commented 11 years ago

Attachment in Trac by atmention:huard, 2006-10-24: histogram1d.py

thouis commented 11 years ago

Attachment in Trac by atmention:huard, 2006-11-14: hist2d_dd.patch

thouis commented 11 years ago

Attachment in Trac by atmention:huard, 2007-03-23: histogramdd.patch

thouis commented 11 years ago

Comment in Trac by atmention:huard, 2007-03-23

The latest patch fixes the bug reported by Ben Granett on the Numpy user list and adds a test. Ticket 455 is also fixed and the corresponding test has been added. Patched against 3591

thouis commented 11 years ago

Comment in Trac by atmention:teoliphant, 2007-04-02

Added patch histogramdd.patch in r3644