incanter / incanter

Clojure-based, R-like statistical computing and graphics environment for the JVM
http://incanter.org
2.24k stars 290 forks source link

Migrate away from java wrappers for CERN/Colt stats functions to kixi.stats #380

Open joinr opened 6 years ago

joinr commented 6 years ago

Upon building, I noticed we're mixing java source. Curious, I saw that incanter-core has some wrappers for colt-specific stuff, one for matrix and one for weibull distribution.

I can think we can excise these: core.matrix should take care of the matrix stuff and kixi.stats has a weibull distribution (along with dozens of other stuff....).

Recommending latching onto kixi.stats for canonical distributions, leveraging the momentum there, tapping into that active community and simultaneously dropping extraneous java builds steps/deps.

joinr commented 6 years ago

I think we can pretty much wipe out the incanter.Matrix class entirely. There are some commented-out legacy functions in incanter.core around 2928, (block-diag, block-matrix, separate-blocks, diagonal-blocks) that call (new Matrix). Other than that, it's a ghost.

joinr commented 6 years ago

cern.jet.math.tdouble.DoubleArithmetic is powering combinatorial functions like incanter.core/choose, factorial.

joinr commented 6 years ago

incanter.interpolation.Utils is a custom java class that's used for cubic-spline interpolation in incanter.interp.cubic-spline. It's doing some matrix math using arraylists. Wondering if that can be replaced with core.matrix.

joinr commented 6 years ago

I think we can replace incanter.core/choose, factorial with equivalents from clojure.math.combinatorics. The Weibull implementation is not directly replaceable, since it's providing both pdf and cdf methods....although it's only being used for those specific methods. kixi.stats provides similar functionality, only for pdf though. Most of the canonical distributions are using the cern.jet libraries, and provide both pdf and cdf. Migrating from Jet means implementing cdfs for said distributions. Seems like kixi is 1/2 way there....I don't know if there are performance reasons to stick with jet either (haven't tested). Could be meaningful for simulation, but unsure.

To mitigate the need for custom java source code, we could also migrate stuff into native clojure. There's some hairy java infix math expressions, but it could be done.

joinr commented 6 years ago

I managed to absorb weibull.java into an implementation in incanter.distributions.