Closed sylvaticus closed 1 year ago
This is a nice suggestion. Thank you for trying SIRUS out. Guillaume Dalle agrees on this issue in https://github.com/rikhuijzer/SIRUS.jl/issues/49. I've added a basic example to https://sirus.jl.huijzer.xyz/dev/basic-example/ and am adding this to the paper too. I'll request a PDF generate in the review issue once I'm done updating the paper.
- First to reproduce it I need to use the whole register_haberman() and _haberman() functions.
This should now be solved. It's still 9 lines of code, but I hope this is okay. I really like the DataDeps checksum to ensure that the dataset is the right one.
- You are mixing MLJ and non-MLJ API together and I think this is distracting..
You're right on that. My reasoning was that the MLJ API is non-standard Julia because @load
in the background loads symbols from packages. I'm not a fan of such dynamic loading because it has bitten me a few times when trying to get R packages to work. Still, I have incorporated your suggestion in the tutorial and show both ways. I hope that's okay.
- Related to the previous point, I can't find in the MLJ constructor the option max_rule that you describe in the paper
You found a bug! Thank you. It turns out that @load StableRulesClassifier pkg="SIRUS"
actually loaded the StableForestClassifier
(which doesn't have the max_rules
hyperparamete) instead of the StableRulesClassifier
. I hope that https://github.com/JuliaAI/MLJModels.jl/pull/526 will be merged so that the issue gets resolved. Until then, using SIRUS: StableRulesClassifier
works.
- The mach.fitresult call to me doesn't return the nice rule base that you present in the paper but rather a ugly:
This is also the result of StableForestClassifier
versus StableRulesClassifier
.
- I have tried to apply the model to the iris dataset but I got predictions that don't even define a probability distribution (bug for multiclasses??):
Do you mean that the output should be a different type than UnivariateFinite
? I chose this type because it's the MLJ.jl standard. Also, the random forest model, with these predictions, is tested via MLJ and achieves quite good performance scores, see for example the GitHub Actions Job Summary at https://github.com/rikhuijzer/SIRUS.jl/actions/runs/6171051743/attempts/1#summary-16748537050. The summary shows that the StableForestClassifier
often performs similar to the XGBoostClassifier
.
Hello, I am trying to replicate the example in the JOSS paper but I have some issues.
First to reproduce it I need to use the whole
register_haberman()
and_haberman()
functions. This data processing step distracts from the objective of showing SIRUS, so perhaps you can find some other simple way to retrieve an already-nicelly-formatted dataset for work on it, or you may want to putX
andy
already nicely formatted somewhere that you (and the reader) can retrieve them with aHTTP.get
or aDownloads.download
...You are mixing MLJ and non-MLJ API together and I think this is distracting.. for example
model = StableRulesClassifier(; max_depth=2, max_rules=8)
is not MLJ API. Instead it should be:Related to the previous point, I can't find in the MLJ constructor the option
max_rule
that you describe in the paperThe
mach.fitresult
call to me doesn't return the nice rule base that you present in the paper but rather a ugly:I have tried to apply the model to the iris dataset but I got predictions that don't even define a probability distribution (bug for multiclasses??):
I believe that it is useful to show an example that for as much concise and simple as possible, it is also reproducible.
https://github.com/openjournals/joss-reviews/issues/5786