Closed rleonid closed 8 years ago
I also think that your second point is valid. But it is true of all software. I face similar issues with regard to core and lots of other downstream packages. It is ultimately, up to the second-party developer to make smart choices.
Not really... she can't do anything about it. Suppose she has a lib
that depends only on oml-lite
, the right move would be to depend on that one.
Now suppose a user of lib
wants to use the full oml
, that user is stuck. It forces the second-party developer to replicate the lib-lite
, lib
structure, effectively imposing build mess to everyone.
This wouldn't happen if you had a properly structured library where oml
depends on oml-lite
(note, whether you distribute them in OPAM separately or not is unrelated, the problem is that with what you added now to OPAM oml
does not reuse oml-lite
).
Do you mean something along the lines of distributing each of the sub-packs separately (ie. Statistics, Classification ... etc)?
Not necessarily. It is unclear to me what exactly is provided by the third-party libraries so it's diffcult for me to answer the question.
oml
does not reuseoml-lite
@rleonid how hard would it be for oml
to reuse oml-lite
?
@hammer The way things are currently structured it would be difficult.
I am hesitant to say impossible, but I actually don't know how to do it without restructuring the project. The main obstacle being the pack
ing logic that gives oml
the light namespace hierarchy (Classification
, Statistics
... etc). I value this hierarchy because it allows us to semantically catalogue operations; Classification.Descriminant
vs Unsupervised.Descriminant
(to be implemented), or Statistics.Hypothesis_test
vs Regression.Test
(also to be implemented). Furthermore, if we were to actually separate oml
into separate packages, I think these would make more sense to depend upon than a C
vs non-C
split.
While I think that @dbuenzli is making a good point about wanting oml
to depend on oml-lite
, it is an idealized point. At this point the set of people who could need a library that uses oml-lite
and then also need oml
is much smaller than the set of people who might be attracted to using oml-lite
for calculations and the set of people who might want to contribute code.
the main obstacle being the packing logic that gives oml the light namespace hierarchy (Classification, Statistics ... etc). I value this hierarchy because it allows us to semantically catalogue operations;
I don't see what prevents you from having Oml_lite
and Oml
namespacing modules, designed to be opened which define the same toplevel modules with Oml
simply including those of Oml_lite
and adding more.
It seems you are underusing the naming and structuring capabilities of the module system. You would not even need to publish more than one package you can simply use opam
's depopts, to build Oml
's library conditionnally.
While I think that @dbuenzli is making a good point about wanting oml to depend on oml-lite, it is an idealized point.
I wouldn't call that an idealized point, you asked for feedback about the approach, here you have it: what you are doing is simply anti-modular.
In a library eco-system like opam
where dependency cones grow quickly, the problem could show up more quickly that you'd think. There are ways to provide oml-lite
without introducing these problems in the eco-system and I don't think they would need much restructuring at the API level beyond the introduction of the aforementioned names and a few open
s in client code.
@dbuenzli How would you address the Functions module?
Writing special mathematical functions is tricky since it requires balancing the trade-offs of convergence, performance and accuracy. Without resorting to rewriting all of them in pure OCaml
I wrapped Cephes
a suitable library in C
(ocephes). There will also be functions that are easily implemented in OCaml
(ex. softmax). Now, AFAIK, since I made the compilation of Functions
dependent on the presence of ocephes
the module will have different signatures in oml
and oml-lite
(or a library with or without this C
dependency), thus preventing the loading of one given the other.
Would you recommend splitting up Functions
entirely?
@dbuenzli How would you address the Functions module?
module Oml_lite : sig
module Functions : sig
val softmax : ?temperature:float -> float array -> float array
end
end = struct
module Functions = Oml_lite_functions
end
module Oml : sig
module Functions : sig
include module type of Oml_lite.Functions
val gamma : float -> float
...
end
end = struct
module Functions = struct
include Oml_lite_functions
include Oml_functions
end
end
Right, but Functions
is packed into Statistics
; unfortunate place for it at the moment, but some sub-pack namespace is unavoidable, or should I get rid of those?
Otherwise I would need 2 level of the previous split; Oml_statistics
, Oml_lite_statistics
that have Stats_functions
and Stats_lite_functions
.
Right, but Functions is packed into Statistics; unfortunate place for it at the moment, but some sub-pack namespace is unavoidable, or should I get rid of those?
There's no problem in adapting the thing for more than one level. You don't need to introduce modules for the other levels if they contain only modules. This means that you can simply make the split lite/non-lite at the lowest level. You can then design your names in the namespacing module:
module Oml : sig ...
end = struct
module Statistics = struct
module Functions = struct
include Oml_lite_functions
include Oml_functions
end
...
end
end
but some sub-pack namespace is unavoidable, or should I get rid of those?
In general avoid too deeply nested hierarchies, beyond two (not counting the toplevel namespace) things become a bit annoying to read and write and people then tend to open
or define their own aliases which is bad for readability.
There will be multi-level split for things such as Classification (which contains Naive_bayes that must be split).
Regardless, you're advocating for manual packing the namespace modules. This is the approach that I wanted to avoid from the beginning. My main reservation against it is that it will become much more difficult to maintain as inter-module dependencies will grow, and I think the solution will not be elegant.
@dbuenzli How about a deal: if I re-implement oml_lite
with with all of these manual packs, then you'll contribute new functionality to oml
(or oml_lite
)?
My main reservation against it is that it will become much more difficult to maintain as inter-module dependencies will grow, and I think the solution will not be elegant.
I'm unconvinced about the maintenance argument. If you are well principled you can keep the mapping between the namespacing module and the implementations obvious. As far as elegance goes, I personally find a language based approach much more elegant than pre-processing ifdefs... it may also make your build system simpler.
@dbuenzli How about a deal: if I re-implement oml_lite with with all of these manual packs, then you'll contribute new functionality to oml (or oml_lite)?
Ha ! For once I wanted to be a user... So that I can throw away toy stuff like this.
I'm unconvinced about the maintenance argument.
Says the guy asking someone else to do the maintenance. 😈
Ha ! For once I wanted to be a user... So that I can throw away toy stuff like this.
Do we have a deal? I'll let you loosely interpret the terms and what you think is a comparable contribution.
Says the guy asking someone else to do the maintenance. 😈
As someone who publishes a lot of packages I really care about maintenance costs, I'm really telling you what I think is best here and what I'd actually do personally. If you compare this approach to the one with pre-processing you'll get help from the compiler and it ensures that both oml
and oml-lite
do not drift appart, something #ifdef
spaghettis make much more easier to achieve.
Do we have a deal? I'll let you loosely interpret the terms and what you think is a comparable contribution.
I'm afraid I won't have time in the foreseeable future, so I can't promise anything. Do what you think is best for the project...
Resolved with #174
@dbuenzli wrote:
I agree about 1), though I am already living a miserable build life dealing with testing. Currently, the choice of
oml-lite
oroml
is about binding toC
code or not (or as some others have described it, "waiting half an hour for fortran to compile"). I think that it would be very difficult to get around this issue by not addressing the build.I also think that your second point is valid. But it is true of all software. I face similar issues with regard to
core
and lots of other downstream packages. It is ultimately, up to the second-party developer to make smart choices.Do you mean something along the lines of distributing each of the sub-packs separately (ie.
Statistics
,Classification
... etc)? I generally agree (and think this is the long term goal) but at the moment I am hesitant to start separating because there are many inter-dependencies; both in how one thinks about the algorithms (ex.LDA
can be used forClassification
orUnsupervised
learning) and how the code is written. Furthermore, one of the problemsoml
is trying to address directly, is to group lots of separate functionality that may not be thoughtfully integrated. For example, the result of a regression analysis should have hypothesis tests easily associated with it.This issue is certainly not closed. I hope that you can give
oml
(oroml-lite
a try) and we can talk specifics.