Open nilsbecker opened 8 years ago
Hello,
After a bit of digging through Biocaml, I found the histogram and accu modules. By accum
I think that you are referring to the latter. I agree that it is potentially very useful; though I think it serves a more general ability of being able to aggregate and group data; but I'm not certain that I want that functionality in a function to create histograms. Perhaps in a separate module?
Regarding the histogram
implementation, very often, I don't know the boundaries of a potential histogram. At these points I'll use Buckets _` or
Width _arguments to get a quicker sense of the data before investigating further; I want to preserve this ability to generate a simple
assoclike that I can easily interpret. That is why I like its signature and I am a bit hesitant on a more general
Histogrammodule, but I would definitely merge one. Maybe we can afterwards rename the function in
Descriptiveto something
simple_hist` :smile:
I agree 100% that limiting the possible data input to float
s (or _ arrays for that matter) is a big limitation and something that the library has to rectify, but at the moment I don't have a good flexible strategy of accomplishing that. If I had to do that right now, I think it will require functorizing many methods and I would prefer use modular implicits to solve these problems. But again a fun thing to try and a PR would be welcome.
yes, sorry i meant Accu
. disclaimer: i have not actually used it, just browsed the interface. i agree that the auto-binning is very useful, and any more general histogram should not replace a more specialized float-based histogram interface. also, obviously multidimensional float histograms for vector valued data would be great to have. by far the most important would be 2d i believe. not sure i can find the time for a PR but if i do, i'll give it a shot.
Sounds good.
the current histogram operates on float data only. biocaml has a more general histogram type, built on their
accum
structure where the data type is polymorphic, and one has to pass a comparison function, as well as a function which increments the count which is also polymorphic. this is quite general and seems useful, but biocaml depends on core and is quite specialized. maybe a more general histogram would be something for a less specialized math/stats library like oml?