gvwilson / sql-tutorial

The Querynomicon: An Introduction to SQL for Wary Data Scientists
https://gvwilson.github.io/sql-tutorial/
Other
418 stars 35 forks source link

unspecified null handling #36

Closed 2x2xplz closed 5 months ago

2x2xplz commented 5 months ago

for 48, 49 and 50, I would suggest adding and body_mass_g is not null to the where clause. Without it, penguins without a measurement (body_mass_g is null) are being classified as large (48 & 49) and abnormal (50). Not clearly specifying how to handle null values (best case, filter them out of the analysis) is a common error in data wrangling that can easily lead to incorrect results. Classifying 1 Adelie as 'large' (>5000 g) when no measured Adelie is even at 4800 g is an example of that.

gvwilson commented 5 months ago

good catch - thank you