remigabillet / DAT_SF_10

Repository for data science 10 course
0 stars 0 forks source link

Homework 2 review #2

Open abshughes opened 9 years ago

abshughes commented 9 years ago

Hi Remi, Below is my review of your homework. Great work in overall. You answered all the questions and went further than required. I learned a bit from you HW, so thank you for that. There are several points I especially like in your work.

  1. You wrote your own functions for implementing crossvalidation, which is more universal: run_crossvalidation can work with different validators and classifiers.
  2. I assume you had random_state=True by default and every time the shuffling produced different results, so you could get a more robust winner K=13 which ensures the highest accuracy under several iterations.
  3. Visuals I really like how you visualized outputs. Most of us connected all the dots and got "chainsaw teeth" and they look a bit too dramatic. It also illustrates what Francesco was talking about visualization today: we shouldn't connect he dots if we don't know what is in-between.

Anna

remigabillet commented 9 years ago

Thanks a lot Anna, I really appreciate the feedback, especially after 2am! :)

On 2. Indeed, I kept getting different results and didn't know about random_state=True or I would have used it. I was getting such different results that I had to implement extra iterations to get solid results.

On Thu, Oct 23, 2014 at 2:23 AM, abshughes notifications@github.com wrote:

Hi Remi, Below is my review of your homework. Great work in overall. You answered all the questions and went further than required. I learned a bit from you HW, so thank you for that. There are several points I especially like in your work.

  1. You wrote your own functions for implementing crossvalidation, which is more universal: run_crossvalidation can work with different validators and classifiers.
  2. I assume you had random_state=True by default and every time the shuffling produced different results, so you could get a more robust winner K=13 which ensures the highest accuracy under several iterations.
  3. Visuals I really like how you visualized outputs. Most of us connected all the dots and got "chainsaw teeth" and they look a bit too dramatic. It also illustrates what Francesco was talking about visualization today: we shouldn't connect he dots if we don't know what is in-between.

Anna

— Reply to this email directly or view it on GitHub https://github.com/regabi/DAT_SF_10/issues/2.

abshughes commented 9 years ago

After 2 pm there was a mistake in my review. "random_state=None" is what I mean when I wrote "random_state=True"

random_state=None is the default value for KFold function and it takes different seed every time you run the function, so consequently the results are different every time as well

When you specify ransom_state=0 (or any other integer), the same seed is taken every time you run the function so that the results are determenistic.

Hope it helps! Anna