rleonid / oml

OCaml Math Library
Apache License 2.0
119 stars 9 forks source link

more general histograms? #117

Open nilsbecker opened 8 years ago

nilsbecker commented 8 years ago

the current histogram operates on float data only. biocaml has a more general histogram type, built on their accum structure where the data type is polymorphic, and one has to pass a comparison function, as well as a function which increments the count which is also polymorphic. this is quite general and seems useful, but biocaml depends on core and is quite specialized. maybe a more general histogram would be something for a less specialized math/stats library like oml?

rleonid commented 8 years ago

Hello,

After a bit of digging through Biocaml, I found the histogram and accu modules. By accum I think that you are referring to the latter. I agree that it is potentially very useful; though I think it serves a more general ability of being able to aggregate and group data; but I'm not certain that I want that functionality in a function to create histograms. Perhaps in a separate module?

Regarding the histogram implementation, very often, I don't know the boundaries of a potential histogram. At these points I'll use Buckets _` orWidth _arguments to get a quicker sense of the data before investigating further; I want to preserve this ability to generate a simpleassoclike that I can easily interpret. That is why I like its signature and I am a bit hesitant on a more generalHistogrammodule, but I would definitely merge one. Maybe we can afterwards rename the function inDescriptiveto somethingsimple_hist` :smile:

I agree 100% that limiting the possible data input to floats (or _ arrays for that matter) is a big limitation and something that the library has to rectify, but at the moment I don't have a good flexible strategy of accomplishing that. If I had to do that right now, I think it will require functorizing many methods and I would prefer use modular implicits to solve these problems. But again a fun thing to try and a PR would be welcome.

nilsbecker commented 8 years ago

yes, sorry i meant Accu. disclaimer: i have not actually used it, just browsed the interface. i agree that the auto-binning is very useful, and any more general histogram should not replace a more specialized float-based histogram interface. also, obviously multidimensional float histograms for vector valued data would be great to have. by far the most important would be 2d i believe. not sure i can find the time for a PR but if i do, i'll give it a shot.

rleonid commented 8 years ago

Sounds good.