scicloj / scicloj.ml.tribuo

Use Tribuo ML model in metamorph.ml
Eclipse Public License 1.0
11 stars 1 forks source link

Problem with tribuo as a transitive dependency #1

Closed kiramclean closed 2 months ago

kiramclean commented 3 months ago

When scicloj.ml.tribuo is included as a dependency in another project, starting a REPL fails with the following error:

Could not start nREPL server: Error building classpath. Could not find artifact org.tribuo:tribuo-all:jar:4.2.0 in central (https://repo1.maven.org/maven2/)

I believe this is due to the lack of support for BOM deps in tools.deps. I don't know if leiningen also has this issue. One possible workaround (insofar as this counts as one at all..) would just be to update the readme of this project to mention that one has to include tribuo-all explicitly in their own deps.edn for this library to work.

An actual solution would be to track down all of the components of tribuo-all that are (or could possibly be) used give the supported tribuo features in this library and list them all explicitly as deps.

I believe it would be worth at least doing the readme update, since the error message is somewhat misleading if you don't already know what the underlying issue is.

behrica commented 3 months ago

I fully agree that we need docu. This is complicated by the fact that the correct dependencies to add, depend on the model used in practice in the model specification:

:tribuo-components [{:name "trainer"
                                 :type "org.tribuo.classification.dtree.CARTClassificationTrainer"}]
                         :tribuo-trainer-name "trainer"}))

Each model type needs a different dependency.

behrica commented 3 months ago

I suggest to identify and add "fixed" the core dependencies, for sure.

behrica commented 3 months ago

The imports are done by tech.ml.dataset.tribuo

[org.tribuo.classification Label LabelFactory]
           [org.tribuo DataSource Output OutputFactory Trainer Model MutableDataset
            Prediction]
           [org.tribuo.impl ArrayExample]
           [org.tribuo.provenance SimpleDataSourceProvenance]
           [org.tribuo.regression RegressionFactory Regressor]
           [org.tribuo.regression.evaluation RegressionEvaluator RegressionEvaluation]
           [com.oracle.labs.mlrg.olcut.config ConfigurationManager]
           [com.oracle.labs.mlrg.olcut.config.json JsonConfigFactory])

The code here uses:

(:import [org.tribuo.regression.evaluation RegressionEvaluator]
          [org.tribuo.regression Regressor]))
behrica commented 3 months ago

I added the needed "core" dependencies to the deps in this branch: https://github.com/scicloj/scicloj.ml.tribuo/commit/6f2bae5f82d3709ff4d7d0fa6fa9e9bd73ea7a38

This still does require that a "user" of scicloj.ml.tribuo needs to add the deps of the used model, as I did in the 'test': https://github.com/scicloj/scicloj.ml.tribuo/blob/6f2bae5f82d3709ff4d7d0fa6fa9e9bd73ea7a38/deps.edn#L19

So we needed to document this in here.

behrica commented 3 months ago

Which model is in which "component" and therefore "in which deps", is fully documented in this table: https://github.com/oracle/tribuo/blob/main/docs/PackageOverview.md So we can refer to it.

behrica commented 3 months ago

@kiramclean I propose to merge this in: https://github.com/scicloj/scicloj.ml.tribuo/tree/unpackTribuoDeps

and then you can try it.

kiramclean commented 3 months ago

This makes sense. It's cool that they split the deps out into smaller jars that can be included individually to minimize the final package size, but sad that it creates this downstream problem for libraries that consume tribuo. Anyway this seems like a reasonable solution. We should update the readme and include a link that list to make it clear which libs need to be included for which models. I can also make a note of that in all the upcoming tutorials/book chapters etc. Thanks for looking into this! I think Tribuo is a good solution for ML models going forward.. seems to work smoothly and across platforms well.

behrica commented 2 months ago

fixed in #2 . Docu was updated by f4ebf1e1bb78eb99dd35ca886d75b9f65d800e8d