mlr-org / mlr3book

Online version of Bischl, B., Sonabend, R., Kotthoff, L., & Lang, M. (Eds.). (2024). "Applied Machine Learning Using mlr3 in R". CRC Press.
https://mlr3book.mlr-org.com/
MIT License
252 stars 59 forks source link

Chapter 6 comments #455

Closed RaphaelS1 closed 1 year ago

RaphaelS1 commented 1 year ago

Based on https://github.com/mlr-org/mlr3book/pull/385

Style

Content

General

mb706 commented 1 year ago

I regularly update the pdf that is linked in the pr comment.

Other things I will address (e.g. coherent vocabulary), I'll also check the other chapters to make sure things are consistent. A few comments on some, after discussion with Bernd:

pipeline plots need to be changed

I'll see if I can make the $plot() prettier, also change the IDs to make things more readable.

tables are printed that aren't useful

should already be a bit better but I will check again later

Avoid example that deliberately result in errors

I actually disagree on this, since the examples result in errors there is no chance the user will accidentally mess anything up. But Bernd is not on my side here, so I will take this out.

Printing the DictionaryPipeOp is not useful

I took this out but now Bernd asked me to include it again, and I think other chapters also show the content of their dictionaries.

provided as so-called PipeOps

changed the wording from "so-called" to "what we call 'PipeOps'", should make it clear we are introducing this as a new term here

To be consistent with the rest of the book change iris -> penguins

I will likely change the datasets around to make the effect of tuning the preprocessing step more obvious, but idk if it will end up being penguins or something else.

Don't need a separate section for $keep_results

Talked to Bernd about this, our conclusion was that debugging is an important topic, so we will keep this and instead try to move other related things to this section.

Pipelines Hyperparameter

After discussion with Bernd we decided to keep this

Make clear distinction between .train/train and .predict/predict

Are you sure you were not reading the extending.qmd file here? Also not exactly a constructive comment.

TODOs from Bernd:

RaphaelS1 commented 1 year ago

I took this out but now Bernd asked me to include it again, and I think other chapters also show the content of their dictionaries.

They're not meant to, it's now all in an appendix at the end. @berndbischl let's discuss and align on this to prevent confusion, see appendix I've made here

changed the wording from "so-called" to "what we call 'PipeOps'", should make it clear we are introducing this as a new term here

To be consistent with other parts of the book this could simply be "provided as 'PipeOps'" the quote marks imply new term and elsewhere we do the same

but idk if it will end up being penguins or something else.

Why not penguins? We should be consistent with other parts of the book unless there's a clear reason not to be

re you sure you were not reading the extending.qmd file here?

Yeah sorry I think this might have been for old version

mb706 commented 1 year ago

Why not penguins?

we want to have an example where the benefit of tuning on preprocessing is shown clearly, e.g. where some well known and simple operation like PCA improves performance noticeably. idk if penguins will do that and will need to try out a few things

berndbischl commented 1 year ago

@mb706 pls choose a dataset which really works. doesnt matter if penguins or not

berndbischl commented 1 year ago

@RaphaelS1 a dataset working is a clear reason for me. we should not feature examples where the shown operation "doesn't work". that's somewhat hard to setup and we should therefore allow deviations

RaphaelS1 commented 1 year ago

@RaphaelS1 a dataset working is a clear reason for me. we should not feature examples where the shown operation "doesn't work". that's somewhat hard to setup and we should therefore allow deviations

I agree I wasn't saying we are forced to stick to one just that we need to be clear why we make changes when we do but more importantly to document every dataset we use throughout the book so we can explain them in the appendices

RaphaelS1 commented 1 year ago