maeve70 / intro-data-capstone-biodiversity

1 stars 0 forks source link

Count vs. nunique #2

Open Caealana opened 6 years ago

Caealana commented 6 years ago

https://github.com/maeve70/intro-data-capstone-biodiversity/blob/1d3be458f37586b48ea8c82acaebb48a6d3c971a/BiodiversityProject/biodiversity.py#L121-L123

Nice job - Your code has been perfect so far! Here, you use ".scientific_name.count()" when you should use "scientific_name.nunique()" like you have done in your previous lines. Since there are rows with repeats of scientific names, some of your protection_counts end up higher than they should be. Count will count all rows containing some sort of scientific name while nunique will only count up occurrences of scientific names it has not already seen.

screenshot at feb 02 16-12-51
maeve70 commented 6 years ago

Ah, gotcha. But, in the instructions it says to paste the code and that code uses count. Maybe the instructions are wrong? `Paste the following code and run it to create a new DataFrame called protection_counts, which is sorted by scientific_name:

protection_counts = species.groupby('conservation_status')\ .scientific_name.count().reset_index()\ .sort_values(by='scientific_name')`

Caealana commented 6 years ago

@maeve70 Ah, OK! Sorry for the confusion. I usually run and grade assignments on my own machine setup. Codecademy's system for evaluating code sometimes interprets code differently from what you'd see happen from running it on your own machine. The instructions are indeed wrong because count and nunique give different results, and for the rest of the assignment they want you to use nunique.

maeve70 commented 6 years ago

Ah, ha. gotcha. I changed the code (and used nunique) on my end after I read your comments and I get it now. Makes sense. :) Thank you again for the review. Much appreciated!