worldbank / dec-python-course

14 stars 5 forks source link

Project2 #11

Closed gauravcusp closed 2 years ago

weilu commented 2 years ago

Task 1:

s/Something is not write/Something is not right

plot code missing import matplotlib.pyplot as plt

Why index the band as the last dimension instead of the first? Is it raster analysis convention?

Would be good to add a note to say that the band index starts with 1 instead of 0.

Task 2:

Consider specifying the outcome variable to be checked, e.g. ndvi

I got np.nanmean(ndvi) = 0.09861903527768272 and the float comparison assertion failed. Consider changing it to assert np.isclose(np.nanmean(ndvi), 0.098619044)

Task 3

Same as task 2 comments above

Task 4

Consider adding to the instructions what the participants are expected to do with the fetch GDP and population data? Do you want them to add a gdp_usd column and a population column to the guns data frame or create a new dataframe with these two columns?

Perhaps this task can be broken down into two steps as the latter depends on the former:

  1. adding the iso3 code column and
  2. adding the GDP and population columns This also allows adding checkpoints (e.g. assertions) after each step.

The way the code comment seems to suggest how to approach this task could DDoS the bank's API server (unintentionally) as it makes a single request for every country - 185 countries * 50 participants = close to 10k requests during the session. I read the API doc allows for multiple countries in the USA;AFG format. Consider providing an API query helper function to fetch the results for a list of specified countries and have participants use that function instead.

Task 5.1

I assume the join is inner join based on the comment. If my assumption is correct, the resulting dataset after join only has 23 rows, as the mass shooting dataset only has 24 rows, the guns data has 190, UK is not present in the guns dataset. It somehow feels like a waste getting the ISO3 codes, GDP and population data for all 190 countries if we only need 23 of them in the end...

Task 5.2

Is this task supposed to be done on the merged dataset from 5.1? Perhaps clarify it in the instructions.

Once one figures out how to do the first one the subsequent 2 are the same. What's the rationale behind the repeat?

Not very relevant to the technical part of the exercise, but rather contextual: Would people question the implications by citing small sample size?

Task 6

What's the rule for identifying outliers here? There are two axes now, which the outlier definition above can apply. Should I apply it to x or y or both?

It didn't specify which type of plot so I just assumed scatter plot for all 3 plots. Is it intended? Or are there other plot types that are meant to be practiced here?

Another non-technical contextual question: for the first plot, might there be a (mis)interpretation that the higher the GDP the more mass shooting?

Task 7

guns['x'] = guns[u'Average total all civilian firearms'] / guns['Population'] * 10
guns['y'] = guns[u'Number of mass shootings'] / guns['Population'] * 1e7

This assumes:

  1. the participant has been modifying the guns dataset
  2. the population column is named "Population"

Both assumptions were not true for me. Perhaps specify them in the previous sections so they are true by the time the participant reaches here?

Why's the 10 and 1e7? I'm not quite sure about the units assumed in the datasets. Consider adding a note to explain?

I pushed my attempt for project 2 here: https://github.com/worldbank/dec-python-course/blob/project2-wei/1-foundations/project-2-rasters_and_functions/Project%202.ipynb It took me 4 hrs to finish everything including writing comments here.