Closed sylvaticus closed 11 months ago
You could remove the quotes in the table of page https://sirus.jl.huijzer.xyz/dev/binary-classification/
Done.
I still think (personal and debatable opinion) that the "basic example" still looks more complex than it should be. If an user run PKGDIR = dirname(dirname(@DIR)) or DOCS_DIR = dirname(@DIR), it would find itself in a different directory than the one in the documentation script. Also, users are not supposed to develop a package in order to use it.
Fixed.
If, for example, you add the (small) dataset to the repository as [SIRUS_FOLDER]/test/data/haberman.csv you could simply use something like that as minimal example for a user:
I have considered this, but am hesitant to add data to the source code. I have also considered a less heavy approach via the Julia (Lazy) Artifact system. However, the main issue I have is with the question: If we add a dataset, shouldn't we add more datasets? I think no dataset is a requirement for the package and therefore they should not be linked.
This would also have the advantage that you could use the dataset easily in your package tests (without download from a remote server).
Datadeps does cache the file, so in many cases the test will not require a download. DataDeps is not yet cached via the GitHub Actions Cache (https://github.com/julia-actions/cache/issues/40), but that might be fixed in the future.
I have also removed the StableRng as I think replicability is not the focus of a minimal example and the begin and let blocks.
Thanks for providing the example code, which made the example easy to understand. I have removed the begin
and let
blocks and replaced them by functions. Functions are often considered as superior to comments in code as they provide clarity while ensuring that the desciption matches the code.
I think some content of "Advanced example" could go to "Implementation overview". Also "Benchmarks" could perhaps go on its own section so that "advanced example" could focus on how to use the various options offered by the model (the hyperparameters)
I haven't done this yet, but start working on it.
One minor issue with the example in the documentation. When you try to run the code as described in the example you will have the error on the line:
StableRulesClassifier = @load StableRulesClassifier pkg="SIRUS"
ERROR: cannot assign a value to imported variable SIRUS.StableRulesClassifier from module Main
Because you have already imported StableRulesClassifier
in the first chunk of the script. Please remove using SIRUS: StableRulesClassifier
from that first chunk.
Also, I had some errors in typing the using..
statements in that order, I suggest to write the using DataFrames
BEFORE the line using DataDeps: DataDeps, DataDep, @datadep_str
I think some content of "Advanced example" could go to "Implementation overview". Also "Benchmarks" could perhaps go on its own section so that "advanced example" could focus on how to use the various options offered by the model (the hyperparameters)
Could we consider this out-of-scope for now? Then I can focus on the issues that directly affect the paper's content. I've opened an issue for this at https://github.com/rikhuijzer/SIRUS.jl/issues/62.
StableRulesClassifier = @load StableRulesClassifier pkg="SIRUS"
Because you have already imported
StableRulesClassifier
in the first chunk of the script. Please removeusing SIRUS: StableRulesClassifier
from that first chunk.
It looks like https://github.com/JuliaAI/MLJModels.jl/pull/526 hasn't fixed the issue that we ran into earlier because
pkg> activate --temp
pkg> add MLJModels
Resolving package versions...
Updating `/private/var/folders/nf/xynnllys6nl13xhj67sqgt7c0000gn/T/jl_AG8cXc/Project.toml`
[d491faf4] + MLJModels v0.16.12
[...]
pkg> add SIRUS
Resolving package versions...
Updating `/private/var/folders/nf/xynnllys6nl13xhj67sqgt7c0000gn/T/jl_AG8cXc/Project.toml`
[cdeec39e] + SIRUS v1.3.3
[...]
julia> using MLJModels
julia> RulesClassifier = @load StableRulesClassifier pkg="SIRUS" verbosity=0
SIRUS.MLJImplementation.StableForestClassifier
So I'm using the latest versions of MLJModels and SIRUS but still StableRulesClassifier
loads the StableForestClassifier
. This is a bug that needs further investigation (I've opened an issue at https://github.com/rikhuijzer/SIRUS.jl/issues/61). For now, I've fixed the documentation by removing the @load
in https://github.com/rikhuijzer/SIRUS.jl/commit/ca1c02bc405a5ea2562d5e944f94d8e942ead59b.
Also, I had some errors in typing the
using..
statements in that order, I suggest to write theusing DataFrames
BEFORE the lineusing DataDeps: DataDeps, DataDep, @datadep_str
Did you get warnings during the loading of using DataFrames
? Could it be that DataFrames was outdated in the global Julia environment? I just ran the following without warnings after updating my global environment:
pkg> activate --temp
julia> begin
using CategoricalArrays: categorical
using CSV: CSV
using DataDeps: DataDeps, DataDep, @datadep_str
using DataFrames
using MLJ
using StableRNGs: StableRNG
using SIRUS: StableRulesClassifier
end
│ Packages [CategoricalArrays, DataDeps, MLJ, StableRNGs, SIRUS] not found, but packages named [CategoricalArrays, DataDeps, MLJ, StableRNGs, SIRUS] are available from a registry.
│ Install packages?
│ (jl_cHDhkF) pkg> add CategoricalArrays DataDeps MLJ StableRNGs SIRUS
└ (y/n/o) [y]:
[...]
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
julia>
Sorry, by the way, for not understanding earlier that you expect to be able to run the example from the command line. I personally almost never do data science from the command line, so that's why I didn't see it before.
I just ran the whole basic example on the command line and it should now work. Please let me know if you run into issues.
Hello, some suggestion concerning the documentation.
You could remove the quotes in the table of page https://sirus.jl.huijzer.xyz/dev/binary-classification/
I still think (personal and debatable opinion) that the "basic example" still looks more complex than it should be. If an user run
PKGDIR = dirname(dirname(@__DIR__))
orDOCS_DIR = dirname(@__DIR__)
, it would find itself in a different directory than the one in the documentation script. Also, users are not supposed to develop a package in order to use it.If, for example, you add the (small) dataset to the repository as
[SIRUS_FOLDER]/test/data/haberman.csv
you could simply use something like that as minimal example for a user:This would also have the advantage that you could use the dataset easily in your package tests (without download from a remote server). I have also removed the StableRng as I think replicability is not the focus of a minimal example and the
begin
andlet
blocks.I think some content of "Advanced example" could go to "Implementation overview". Also "Benchmarks" could perhaps go on its own section so that "advanced example" could focus on how to use the various options offered by the model (the hyperparameters)
https://github.com/openjournals/joss-reviews/issues/5786