Closed RaphaelS1 closed 1 year ago
I regularly update the pdf that is linked in the pr comment.
Other things I will address (e.g. coherent vocabulary), I'll also check the other chapters to make sure things are consistent. A few comments on some, after discussion with Bernd:
pipeline plots need to be changed
I'll see if I can make the $plot()
prettier, also change the IDs to make things more readable.
tables are printed that aren't useful
should already be a bit better but I will check again later
Avoid example that deliberately result in errors
I actually disagree on this, since the examples result in errors there is no chance the user will accidentally mess anything up. But Bernd is not on my side here, so I will take this out.
Printing the DictionaryPipeOp is not useful
I took this out but now Bernd asked me to include it again, and I think other chapters also show the content of their dictionaries.
provided as so-called PipeOps
changed the wording from "so-called" to "what we call 'PipeOps'", should make it clear we are introducing this as a new term here
To be consistent with the rest of the book change iris -> penguins
I will likely change the datasets around to make the effect of tuning the preprocessing step more obvious, but idk if it will end up being penguins or something else.
Don't need a separate section for $keep_results
Talked to Bernd about this, our conclusion was that debugging is an important topic, so we will keep this and instead try to move other related things to this section.
Pipelines Hyperparameter
After discussion with Bernd we decided to keep this
Make clear distinction between .train/train and .predict/predict
Are you sure you were not reading the extending.qmd
file here? Also not exactly a constructive comment.
TODOs from Bernd:
I took this out but now Bernd asked me to include it again, and I think other chapters also show the content of their dictionaries.
They're not meant to, it's now all in an appendix at the end. @berndbischl let's discuss and align on this to prevent confusion, see appendix I've made here
changed the wording from "so-called" to "what we call 'PipeOps'", should make it clear we are introducing this as a new term here
To be consistent with other parts of the book this could simply be "provided as 'PipeOps
'" the quote marks imply new term and elsewhere we do the same
but idk if it will end up being penguins or something else.
Why not penguins? We should be consistent with other parts of the book unless there's a clear reason not to be
re you sure you were not reading the extending.qmd file here?
Yeah sorry I think this might have been for old version
Why not penguins?
we want to have an example where the benefit of tuning on preprocessing is shown clearly, e.g. where some well known and simple operation like PCA improves performance noticeably. idk if penguins will do that and will need to try out a few things
@mb706 pls choose a dataset which really works. doesnt matter if penguins or not
@RaphaelS1 a dataset working is a clear reason for me. we should not feature examples where the shown operation "doesn't work". that's somewhat hard to setup and we should therefore allow deviations
@RaphaelS1 a dataset working is a clear reason for me. we should not feature examples where the shown operation "doesn't work". that's somewhat hard to setup and we should therefore allow deviations
I agree I wasn't saying we are forced to stick to one just that we need to be clear why we make changes when we do but more importantly to document every dataset we use throughout the book so we can explain them in the appendices
Based on https://github.com/mlr-org/mlr3book/pull/385
Style
Creating PipeOps
). Assume people will do what you tell them and can debug for themselves (or at least can open issues).DictionaryPipeOp
is not useful - it's hard to read and users haven't learnt anything. Better to manually print a table of (non-meta) pipeops and explain theseContent
PipeOp: Pipeline Operators
is confusing and will isolate readers - I really do't know what 'generalized notion of a function, with a certain twist' means. Simplify the writing without dumbing down the content.$train
then this is still actually only one input..id
is confusing. Just show this in a worked example at some point in the chapter and explain it then, doesn't need to be taken on its own.po
output is confusing and needs to be explained to user - it is not intuitive on its own$keep_results
. Just briefly mention it somewhere else (in fact you do this inAccessing Pipeline Objects
, this is probably sufficient)as_learner
function". Later you can then put it all together.Pipeline Hyperparameters
: Too much repetition about paradox and content in optimization chapter. It would be more interesting just to say that one advantage of treating a graph like a learner is the ability to tune, and then just show an example with AutoTunerIDs and Name Clashes
firstly duplicative of your discussion above about IDs (though that should be deleted), secondly doesn't need to be a section. Instead just show a worked example where you have to manually set IDs and add a comment explaining why you did this. Or we could even add an emoji in the margin for something like 'important concept' (e.g., ❗)Advanced Graphs
- Remember for many users even the 'simple' thing is advanced. So I'd suggest changing name to something like 'Non-linear graphs'Bagging
- Nowhere in the section does it say you have a bagging pipeline. But I suspect this will go in 'Common Patterns and ppl()' ? Need to explaininnum
. A more interesting example might be to use classifavg for multiple different models, then you can say the bagging pipeline is shorthand if all models are the same (but will leave choice to you)PipeOpLearnerCV and Stacking
- Section broken?Stacking
- What doesNULL
mean in your figure? Arrows inside the figure might help to understand it. Be careful about throwing in new terms without explaining, e.g. 'overfitting'. Need to explain "level 0 learner" andPipeOpNOP
in more detail, they're complex to understand. Output at end is a really nice way to show this but could you consider a different output method, e.g. plotting decision tree? Will be simpler to follow and will look better when printed.Tuning Combined Spaces
- Duplicative of hyperparameter content above. As said above, just need one AutoTuner example and you're done.Tuning Alternative Paths
— What do you mean ‘only one path will be executed, the others are ignored’ for branching? I'm not show the figure helps explain this, maybe use arrows to make figure clearer? Example I think is too hard to follow, it doesn't help me understand when to use branching or when to use nop.Tuning PipeOpProxy
- TODOppl()
" - Very important section, though currently looks like wrong content is there?Multilevel Stacking
- I like the idea of showing one very complicated example - but I think it would be better if you put this in the conclusion of the chapter and before it wrote something like "and now we'll show you the true power of pipelines".Graph, The Whole Story
- Change name to a bit more intuitive likeTechnical Graph Details
and we'll mark the section with an advanced emoji in the margin. Make clear in the intro to this section that it is advanced and also why someone might want to read it (i.e., what extra content do you teach). Just be careful not to write too much or it will distract from the rest of the bookSpecific PipeOps
- Probably makes sense to move this before the previous section. Also in the intro make clear why you're talking about these one specifically. This section is nice, gives a good quick overview to PipeOps and shows how to use them.PipeOpChunk
sectionGeneral