tiemvanderdeure / SpeciesDistributionModels.jl

MIT License
1 stars 0 forks source link

Lighten the MLJ dependency? #1

Closed ablaom closed 9 months ago

ablaom commented 9 months ago

Just noticed that you have MLJ as a dep here. Depending on your objectives, you may be able to lighten that. MLJ itself just imports a bunch of components. So, for example, maybe you just need MLJBase and StatisticalMeasures.

Here is what the various components do:

help?> MLJ
search: MLJ MLJType MLJFlow MLJOpenML MLJ_VERSION MLJIteration multitarget_l2

  MLJ

  MLJ (https://alan-turing-institute.github.io/MLJ.jl/dev/) is a Machine
  Learning toolbox for Julia. It collects together functionality from the
  following packages, which can be loaded separately:

    •  MLJBase.jl: The machine interface, tools to partition and unpack
       datasets, evaluate/evaluate! for model performance, |> pipeline
       syntax, TransformedTargetModel wrapper, general model composition
       syntax (learning networks), synthetic data generators, scitype and
       schema methods (from ScientificTypes.jl) for checking how MLJ
       interprets your data

    •  StatisticalMeasures.jl: MLJ-compatible measures (metrics) for
       machine learning, confusion matrices, ROC curves.

    •  MLJModels.jl: Common transformers for data preprocessing,
       searching the model registry, loading models with @load

    •  MLJTuning.jl: Hyperparameter optimization via TunedModel wrapper

    •  MLJIteration.jl: IteratedModel Wrapper for controlling iterative
       models

    •  MLJEnsembles.jl: Homogeneous model ensembling, via the
       EnsembleModel wrapper

    •  MLJBalancing.jl: Incorporation of oversampling/undersampling
       methods in pipelines, via the BalancedModel wrapper

    •  OpenML.jl: Tool for grabbing datasets from OpenML.org

If you only need a few 3rd party models, you can load them manually (see below) and not need the @load convenience loader from MLJModels:

julia> import MLJDecisionTreeInterface.DecisionTreeClassifier
julia> Tree = MLJDecisionTreeInterface.DecisionTreeClassifier
julia> tree = Tree()
tiemvanderdeure commented 9 months ago

Thanks for chipping in @ablaom!

I think you are right and we can easily get away with using just a few parts of the MLJ ecosystem.

Good to see it is so easy to get rid of @load.

There are just 5 or maybe 6 classifiers that are commonly used in species distribution modelling. We want to make it very straightforward for people to find the models they need, with the settings and names similar to what people are used to from similar packages in R. One possibility is to just add them as dependencies, we also discussed having something like a load_recommended() function.

In any case, being able to build this on top of MLJ is really convenient as it will be super easy to add more models.

tiemvanderdeure commented 9 months ago

Solved by https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl/pull/4