memphis-iis / datawhys-content-notebooks-python

Content for DataWhys in the form of JupyterLab notebooks (.ipynb files)
Apache License 2.0
8 stars 1 forks source link

Things in Random-forests-PS #131

Open sdflem opened 1 year ago

sdflem commented 1 year ago

(1) The first sentence of the notebook says (emphasis added):

In this session, we'll use the boston dataset, which has been used to examine the relationship between clean air and house prices:

Multiple of us found this sentence confusing, because it sets the incorrect expectation that clean air will be important in the sequel. However, the dataset contains a wide variety of variables, and the clean-air variable isn't more special than the others.

(2) In the "Fit Models" section, the calls to predict return predictions, but nothing is done with the predictions. For example, the predictions aren't printed or anything. In the subsequent "Evaluate the models" section, calls are made to score. Although the API docs are a little unclear to me, it appears that score itself essentially calls predict. This means that the calls to predict in the previous section are basically pointless.

(3) The answer given to the question, "Look carefully at the three feature importance plots, hovering your mouse over each bar. What are the major differences between them?" refers to two of the models as "ensemble models". This terminology is a little confusing. I see that those models come from the sklearn.ensemble library. If the answer is meant to refer to this library, then I would use the full name and typeset it as code (i.e., sklearn.ensemble). If the answer is meant to refer to ensemble as a concept, then it would be good to remind the reader what that concept means.

aolney commented 6 months ago

I'll have to follow up on this later. Will add it to the tracker.