Why index the band as the last dimension instead of the first? Is it raster analysis convention?
Would be good to add a note to say that the band index starts with 1 instead of 0.
Task 2:
Consider specifying the outcome variable to be checked, e.g. ndvi
I got np.nanmean(ndvi) = 0.09861903527768272 and the float comparison assertion failed. Consider changing it to assert np.isclose(np.nanmean(ndvi), 0.098619044)
Task 3
Same as task 2 comments above
Task 4
Consider adding to the instructions what the participants are expected to do with the fetch GDP and population data? Do you want them to add a gdp_usd column and a population column to the guns data frame or create a new dataframe with these two columns?
Perhaps this task can be broken down into two steps as the latter depends on the former:
adding the iso3 code column and
adding the GDP and population columns
This also allows adding checkpoints (e.g. assertions) after each step.
The way the code comment seems to suggest how to approach this task could DDoS the bank's API server (unintentionally) as it makes a single request for every country - 185 countries * 50 participants = close to 10k requests during the session. I read the API doc allows for multiple countries in the USA;AFG format. Consider providing an API query helper function to fetch the results for a list of specified countries and have participants use that function instead.
Task 5.1
I assume the join is inner join based on the comment. If my assumption is correct, the resulting dataset after join only has 23 rows, as the mass shooting dataset only has 24 rows, the guns data has 190, UK is not present in the guns dataset. It somehow feels like a waste getting the ISO3 codes, GDP and population data for all 190 countries if we only need 23 of them in the end...
Task 5.2
Is this task supposed to be done on the merged dataset from 5.1? Perhaps clarify it in the instructions.
Once one figures out how to do the first one the subsequent 2 are the same. What's the rationale behind the repeat?
Not very relevant to the technical part of the exercise, but rather contextual: Would people question the implications by citing small sample size?
Task 6
What's the rule for identifying outliers here? There are two axes now, which the outlier definition above can apply. Should I apply it to x or y or both?
It didn't specify which type of plot so I just assumed scatter plot for all 3 plots. Is it intended? Or are there other plot types that are meant to be practiced here?
Another non-technical contextual question: for the first plot, might there be a (mis)interpretation that the higher the GDP the more mass shooting?
Task 7
guns['x'] = guns[u'Average total all civilian firearms'] / guns['Population'] * 10
guns['y'] = guns[u'Number of mass shootings'] / guns['Population'] * 1e7
This assumes:
the participant has been modifying the guns dataset
the population column is named "Population"
Both assumptions were not true for me. Perhaps specify them in the previous sections so they are true by the time the participant reaches here?
Why's the 10 and 1e7? I'm not quite sure about the units assumed in the datasets. Consider adding a note to explain?
Task 1:
s/Something is not write/Something is not right
plot code missing
import matplotlib.pyplot as plt
Why index the band as the last dimension instead of the first? Is it raster analysis convention?
Would be good to add a note to say that the band index starts with 1 instead of 0.
Task 2:
Consider specifying the outcome variable to be checked, e.g.
ndvi
I got
np.nanmean(ndvi) = 0.09861903527768272
and the float comparison assertion failed. Consider changing it toassert np.isclose(np.nanmean(ndvi), 0.098619044)
Task 3
Same as task 2 comments above
Task 4
Consider adding to the instructions what the participants are expected to do with the fetch GDP and population data? Do you want them to add a
gdp_usd
column and apopulation
column to theguns
data frame or create a new dataframe with these two columns?Perhaps this task can be broken down into two steps as the latter depends on the former:
The way the code comment seems to suggest how to approach this task could DDoS the bank's API server (unintentionally) as it makes a single request for every country - 185 countries * 50 participants = close to 10k requests during the session. I read the API doc allows for multiple countries in the USA;AFG format. Consider providing an API query helper function to fetch the results for a list of specified countries and have participants use that function instead.
Task 5.1
I assume the join is inner join based on the comment. If my assumption is correct, the resulting dataset after join only has 23 rows, as the mass shooting dataset only has 24 rows, the guns data has 190, UK is not present in the guns dataset. It somehow feels like a waste getting the ISO3 codes, GDP and population data for all 190 countries if we only need 23 of them in the end...
Task 5.2
Is this task supposed to be done on the merged dataset from 5.1? Perhaps clarify it in the instructions.
Once one figures out how to do the first one the subsequent 2 are the same. What's the rationale behind the repeat?
Not very relevant to the technical part of the exercise, but rather contextual: Would people question the implications by citing small sample size?
Task 6
What's the rule for identifying outliers here? There are two axes now, which the outlier definition above can apply. Should I apply it to x or y or both?
It didn't specify which type of plot so I just assumed scatter plot for all 3 plots. Is it intended? Or are there other plot types that are meant to be practiced here?
Another non-technical contextual question: for the first plot, might there be a (mis)interpretation that the higher the GDP the more mass shooting?
Task 7
This assumes:
Both assumptions were not true for me. Perhaps specify them in the previous sections so they are true by the time the participant reaches here?
Why's the 10 and 1e7? I'm not quite sure about the units assumed in the datasets. Consider adding a note to explain?
I pushed my attempt for project 2 here: https://github.com/worldbank/dec-python-course/blob/project2-wei/1-foundations/project-2-rasters_and_functions/Project%202.ipynb It took me 4 hrs to finish everything including writing comments here.