Open superbobry opened 8 years ago
The short answer is not Vector.t
specifically. Vector.t
(and Matrices
) is a stop gap that is woefully insufficient for what I actually want for a linear algebra and statistical analysis supporting set of data structures.
The long answer is yes and I was working on implementing this in Bau. It wasn't a show stopper but I quickly arrived at the conclusion that one needed ppx
rewriters to achieve fast performance on BigArray
iteration. My next step was going to be in creating a variant layer that would manage data representation with minimal copying. But that is a very big task that I didn't (and unfortunately still don't) have the bandwith to tackle.
Perhaps this is a good spot as any to write down my thoughts and experiences on the subject. I think there are several issues in this space, some of which interact in various ways:
float array
s out of convenience, but depending upon the situation (size, convenience) one can make arguments for BigArrays
or float lists
. I'm still not certain whether the library should remain flexible (operate over different inputs), strict (one type and everything is converted into that type) or intelligent (take as input lots of different types, and copy when/if needed). My inclination is for an "intelligent" (my naming of that approach exposes my bias) approach, but how to balance the features (and speed) found in a vector based language (ex. J) versus one that is expressive of statistical problems (ex. R) is still unknown to me. The other complications in this area are:
Bigarray
, Lacaml
stuff. To me this seems pressing and perhaps a good argument for sticking with float array
's at the moment.floats
. I want these things to have more of an OCaml
/EDSL style (transposing a matrix 2x should be a no-op) and they should be agnostic to the actual representation (computing the mean
should be an operation that can take an float array
, float list
... etc).All of this shouldn't discourage you from submitting a PR; working code is infinitely more valuable than no code. I just want to share my thinking on the subject.
A bit OT but note that things like flambda are not available in bytecode which is what js_of_ocaml
starts from.
Which leads me to something I wanted to ask for a long time, how much of oml
can be compiled to the browser with js_of_ocaml
? E.g. bigarrays are available there, but I doubt lacaml
stuff can be brought there.
I did not realize that bigarrays are available on js_of_ocaml
! But you're also pointing at another subtle issue of bytecode vs native.
I've reserved a bit of time this and next week to see how much of oml
I can built without the lacaml
, lbfgs
; An OCaml
only oml
. This will precipitate even more build system hacking, but hopefully an elegant solution will materialize.
@dbuenzli If you have a chance take a look at oml-lite
. Some of the big stuff (ex. multivariate and logistic regression) is missing. But it is viable, and hopefully the framework for iteration has been laid.
@rleonid Caveat, I have little knowledge on how oml
is structured (I never used it, yet). However I do have the impression that by extracting oml-lite
the way you did through cppo
means 1) You are going to live a miserable build life in the long term 2) Other third-party libraries wanting to build on top of oml
will have to choose between oml-lite
and oml
and the final user of the third-party library may not agree with the choice the latter did.
My approach would be to rather try to reorganize the API so that oml-lite
shows up naturally as a single library with others libraries gradually mixing in the C and fortran dependencies. The whole could then be distributed through a single package, the sub-libraries letting end-user control the amount of C or fortran they want to bring in their code base.
@dbuenzli Thank you for your feedback. I've replied here https://github.com/hammerlab/oml/issues/173, so maybe we can keep these discussions organized.
Have you considered making
Vector.t
strided (c.f. NumPy)?The major benefits of this approach are:
Code reuse between
Vectors
andMatrices
, e.g.Update: I probably should've added this as a comment to #27.