lukepolson / HEP_ML_Lessons

Other
0 stars 2 forks source link

Suggested improvements to lesson #2

Open meevans1 opened 4 years ago

meevans1 commented 4 years ago

Thanks @lukepolson for this amazing work! Below I list some comments. I can start working through some of them if you agree

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/01-introduction.md#what-is-machine-learning "Classification. The input is multi-dimensional data points and the output is an integer (which represents different classes). Consider the following example with two classes:” Couldn’t this be confusing because in the hands-on part we have an example with two classes (and we’re trying to classify) but the output of our machine learning algorithm isn’t an integer.

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/01-introduction.md#what-role-does-machine-learning-have-in-particle-physics "(My Research)..." I think the link to ML could be made clearer here. Is there anyway you can tie it back in to explain whether it’s regression/classification/generation?

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/01-introduction.md I’m not sure the key point "In general, machine learning is about designing a function f...." is totally clear from the lesson. Thinking of the key points as a summary, maybe a more suitable key point could be “The 3 main tasks of Machine Learning are regression, classification and generation”. This would also fit with the 3 sections into which you’ve split this lesson.

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/02-mltechnical.md#loss-function-and-likelihood I think the last sentence "Thus minimizing the MSE..." might be a bit of a logical jump. Can you add a sentence in between to help the logic? Maybe you could also connect this statement to the plots more.

A general comment is that it'd be really nice if we could also find links to free, online material as well as the books, to ease accessibility. e.g. https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/02-mltechnical.md#regression-classification-generation

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/03-Resources.md#proficiency-in-python You mention 3 python libraries but only list 2. Did you forget one? Isn't numpy and pandas enough?

meevans1 commented 4 years ago

Adding some more suggestions as I keep going through.

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/03-Resources.md I like your key point "scikit-learn and TensorFlow are two good options for machine learning in python." Taking inspiration from this, I'd prefer something like "NumPy and pandas are the main libraries for scientific computing", rather than the key points "Textbook provided...".

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/03-Resources.md Teaching time currently says 10 mins. Similar to intro and maths, isn't this meant to be read before? In which case it should be 0 mins for consistency?

In various places you say "data set" and in others "dataset". Are you changing on purpose? If not, I suggest trying to stay consistent.

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/06-Model_Comparison.md I'd like if the key points from lesson 6 were more of a summary of the lesson. e.g. Many metrics exist to assess classifier performance. Making plots is useful to assess classifier performance.

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/07-nn.md Shouldn't there be some "teaching" time to this episode, for the students to read through? Or will you let the students read through after?

meevans1 commented 4 years ago

I attempted to add links to online resources in pull request #4 https://github.com/lukepolson/HEP_ML_Lessons/pull/4

I wrote my suggested lesson key points improvements in pull request #5 https://github.com/lukepolson/HEP_ML_Lessons/pull/5

I changed occurrences of "data set" to "dataset" for consistency throughout in pull request #6 https://github.com/lukepolson/HEP_ML_Lessons/pull/6

I suggested a clarification to "Thus minimizing the MSE.." in pull request #7 https://github.com/lukepolson/HEP_ML_Lessons/pull/7

I've changed my mind about this comment: https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/01-introduction.md#what-is-machine-learning "Classification. The input is multi-dimensional data points and the output is an integer (which represents different classes). Consider the following example with two classes:” Couldn’t this be confusing because in the hands-on part we have an example with two classes (and we’re trying to classify) but the output of our machine learning algorithm isn’t an integer. Having read through the rest of your tutorial, I like how general your introduction is. You then mention in the Model Training episode that it's up to the user to decide the threshold. This is good.

meevans1 commented 4 years ago

So this just leaves a couple of suggestions:

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/01-introduction.md#what-role-does-machine-learning-have-in-particle-physics "(My Research)..." I think the link to ML could be made clearer here. Is there anyway you can tie it back in to explain whether it’s regression/classification/generation?

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/03-Resources.md Teaching time currently says 10 mins. Similar to intro and maths, isn't this meant to be read before? In which case it should be 0 mins for consistency?

https://github.com/lukepolson/HEP_ML_Lessons/blob/gh-pages/_episodes/07-nn.md Shouldn't there be some "teaching" time to this episode, for the students to read through? Or will you let the students read through after?

A general question is how long you have for your tutorial? Is it an hour or 1:15 as your lesson index suggests? Do you know someone that could run through it and test the timing? I could ask someone if not