rleonid / oml

OCaml Math Library
Apache License 2.0
119 stars 9 forks source link

Oml vs Oml-lite split #173

Closed rleonid closed 8 years ago

rleonid commented 8 years ago

@dbuenzli wrote:

@rleonid Caveat, I have little knowledge on how oml is structured (I never used it, yet). However I do have the impression that by extracting oml-lite the way you did through cppo means 1) You are going to live a miserable build life in the long term 2) Other third-party libraries wanting to build on top of oml will have to choose between oml-lite and oml and the final user of the third-party library may not agree with the choice the latter did.

I agree about 1), though I am already living a miserable build life dealing with testing. Currently, the choice of oml-lite or oml is about binding to C code or not (or as some others have described it, "waiting half an hour for fortran to compile"). I think that it would be very difficult to get around this issue by not addressing the build.

I also think that your second point is valid. But it is true of all software. I face similar issues with regard to core and lots of other downstream packages. It is ultimately, up to the second-party developer to make smart choices.

My approach would be to rather try to reorganize the API so that oml-lite shows up naturally as a single library with others libraries gradually mixing in the C and fortran dependencies. The whole could then be distributed through a single package, the sub-libraries letting end-user control the amount of C or fortran they want to bring in their code base.

Do you mean something along the lines of distributing each of the sub-packs separately (ie. Statistics, Classification ... etc)? I generally agree (and think this is the long term goal) but at the moment I am hesitant to start separating because there are many inter-dependencies; both in how one thinks about the algorithms (ex. LDA can be used for Classification or Unsupervised learning) and how the code is written. Furthermore, one of the problems oml is trying to address directly, is to group lots of separate functionality that may not be thoughtfully integrated. For example, the result of a regression analysis should have hypothesis tests easily associated with it.

This issue is certainly not closed. I hope that you can give oml (or oml-lite a try) and we can talk specifics.

dbuenzli commented 8 years ago

I also think that your second point is valid. But it is true of all software. I face similar issues with regard to core and lots of other downstream packages. It is ultimately, up to the second-party developer to make smart choices.

Not really... she can't do anything about it. Suppose she has a lib that depends only on oml-lite, the right move would be to depend on that one.

Now suppose a user of lib wants to use the full oml, that user is stuck. It forces the second-party developer to replicate the lib-lite, lib structure, effectively imposing build mess to everyone.

This wouldn't happen if you had a properly structured library where oml depends on oml-lite (note, whether you distribute them in OPAM separately or not is unrelated, the problem is that with what you added now to OPAM oml does not reuse oml-lite).

Do you mean something along the lines of distributing each of the sub-packs separately (ie. Statistics, Classification ... etc)?

Not necessarily. It is unclear to me what exactly is provided by the third-party libraries so it's diffcult for me to answer the question.

hammer commented 8 years ago

oml does not reuse oml-lite

@rleonid how hard would it be for oml to reuse oml-lite?

rleonid commented 8 years ago

@hammer The way things are currently structured it would be difficult.

I am hesitant to say impossible, but I actually don't know how to do it without restructuring the project. The main obstacle being the packing logic that gives oml the light namespace hierarchy (Classification, Statistics ... etc). I value this hierarchy because it allows us to semantically catalogue operations; Classification.Descriminant vs Unsupervised.Descriminant (to be implemented), or Statistics.Hypothesis_test vs Regression.Test (also to be implemented). Furthermore, if we were to actually separate oml into separate packages, I think these would make more sense to depend upon than a C vs non-C split.

While I think that @dbuenzli is making a good point about wanting oml to depend on oml-lite, it is an idealized point. At this point the set of people who could need a library that uses oml-lite and then also need oml is much smaller than the set of people who might be attracted to using oml-lite for calculations and the set of people who might want to contribute code.

dbuenzli commented 8 years ago

the main obstacle being the packing logic that gives oml the light namespace hierarchy (Classification, Statistics ... etc). I value this hierarchy because it allows us to semantically catalogue operations;

I don't see what prevents you from having Oml_lite and Oml namespacing modules, designed to be opened which define the same toplevel modules with Oml simply including those of Oml_lite and adding more.

It seems you are underusing the naming and structuring capabilities of the module system. You would not even need to publish more than one package you can simply use opam's depopts, to build Oml's library conditionnally.

While I think that @dbuenzli is making a good point about wanting oml to depend on oml-lite, it is an idealized point.

I wouldn't call that an idealized point, you asked for feedback about the approach, here you have it: what you are doing is simply anti-modular.

In a library eco-system like opam where dependency cones grow quickly, the problem could show up more quickly that you'd think. There are ways to provide oml-lite without introducing these problems in the eco-system and I don't think they would need much restructuring at the API level beyond the introduction of the aforementioned names and a few opens in client code.

rleonid commented 8 years ago

@dbuenzli How would you address the Functions module?

Writing special mathematical functions is tricky since it requires balancing the trade-offs of convergence, performance and accuracy. Without resorting to rewriting all of them in pure OCaml I wrapped Cephes a suitable library in C (ocephes). There will also be functions that are easily implemented in OCaml (ex. softmax). Now, AFAIK, since I made the compilation of Functions dependent on the presence of ocephes the module will have different signatures in oml and oml-lite (or a library with or without this C dependency), thus preventing the loading of one given the other.

Would you recommend splitting up Functions entirely?

dbuenzli commented 8 years ago

@dbuenzli How would you address the Functions module?

module Oml_lite : sig
  module Functions : sig
    val softmax : ?temperature:float -> float array -> float array 
  end
end = struct
  module Functions = Oml_lite_functions
end
module Oml : sig 
  module Functions : sig 
    include module type of Oml_lite.Functions
    val gamma : float -> float
    ...
  end 
end = struct
   module Functions = struct
      include Oml_lite_functions
      include Oml_functions
   end
end
rleonid commented 8 years ago

Right, but Functions is packed into Statistics; unfortunate place for it at the moment, but some sub-pack namespace is unavoidable, or should I get rid of those? Otherwise I would need 2 level of the previous split; Oml_statistics , Oml_lite_statistics that have Stats_functions and Stats_lite_functions.

dbuenzli commented 8 years ago

Right, but Functions is packed into Statistics; unfortunate place for it at the moment, but some sub-pack namespace is unavoidable, or should I get rid of those?

There's no problem in adapting the thing for more than one level. You don't need to introduce modules for the other levels if they contain only modules. This means that you can simply make the split lite/non-lite at the lowest level. You can then design your names in the namespacing module:

module Oml : sig ...
end = struct
  module Statistics = struct 
     module Functions = struct
        include Oml_lite_functions
        include Oml_functions
     end
   ...
  end
end

but some sub-pack namespace is unavoidable, or should I get rid of those?

In general avoid too deeply nested hierarchies, beyond two (not counting the toplevel namespace) things become a bit annoying to read and write and people then tend to open or define their own aliases which is bad for readability.

rleonid commented 8 years ago

There will be multi-level split for things such as Classification (which contains Naive_bayes that must be split).

Regardless, you're advocating for manual packing the namespace modules. This is the approach that I wanted to avoid from the beginning. My main reservation against it is that it will become much more difficult to maintain as inter-module dependencies will grow, and I think the solution will not be elegant.

@dbuenzli How about a deal: if I re-implement oml_lite with with all of these manual packs, then you'll contribute new functionality to oml (or oml_lite)?

dbuenzli commented 8 years ago

My main reservation against it is that it will become much more difficult to maintain as inter-module dependencies will grow, and I think the solution will not be elegant.

I'm unconvinced about the maintenance argument. If you are well principled you can keep the mapping between the namespacing module and the implementations obvious. As far as elegance goes, I personally find a language based approach much more elegant than pre-processing ifdefs... it may also make your build system simpler.

@dbuenzli How about a deal: if I re-implement oml_lite with with all of these manual packs, then you'll contribute new functionality to oml (or oml_lite)?

Ha ! For once I wanted to be a user... So that I can throw away toy stuff like this.

rleonid commented 8 years ago

I'm unconvinced about the maintenance argument.

Says the guy asking someone else to do the maintenance. 😈

Ha ! For once I wanted to be a user... So that I can throw away toy stuff like this.

Do we have a deal? I'll let you loosely interpret the terms and what you think is a comparable contribution.

dbuenzli commented 8 years ago

Says the guy asking someone else to do the maintenance. 😈

As someone who publishes a lot of packages I really care about maintenance costs, I'm really telling you what I think is best here and what I'd actually do personally. If you compare this approach to the one with pre-processing you'll get help from the compiler and it ensures that both oml and oml-lite do not drift appart, something #ifdef spaghettis make much more easier to achieve.

Do we have a deal? I'll let you loosely interpret the terms and what you think is a comparable contribution.

I'm afraid I won't have time in the foreseeable future, so I can't promise anything. Do what you think is best for the project...

rleonid commented 8 years ago

Resolved with #174