lter / ssecr

Synthesis Skills for Early Career Researchers Course
https://lter.github.io/ssecr/
2 stars 0 forks source link

Quality Control Considerations #25

Open njlyon0 opened 2 weeks ago

njlyon0 commented 2 weeks ago

Summary

Need to outline some particularly thorny issues in quality control so that students know to check for / resolve certain issues.

Sub-Tasks

Resources

njlyon0 commented 2 weeks ago

Examples

Taxonomic issues (i.e., aliases / synonyms that differ among datasets but refer to same species)

lkuiucsb commented 2 weeks ago
  1. Verifying taxonomic classifications against authoritative sources, such as ITIS or WoRMs to keep all the taxonomic names consistent across projects.
  2. Handling missing data, across sites. Different projects might have different definition on missing data.
  3. Aggregate data to consistent temporal or spatial scales.
scelmendorf commented 2 weeks ago
  1. Values out of range (either make informed filters to spot-check and flag or do data-driven - ie. just remove anything >N SD from the mean)
  2. Lat and longs reversed or western hemisphere longitudes missing the sign.
  3. Duplicates
  4. Whether missing data are MCAR or non-ignorable
  5. Date formatting
  6. Missing value codes
njlyon0 commented 1 week ago

First Draft Integrated

I added the points that you both brought up into the data wrangling module in a new 'QC Considerations' section (see here: https://lter.github.io/ssecr/mod_wrangle.html#qc-considerations).

Note that I did combine some of the separate points you each made about missing data into one point that hopefully catches the spirit of the individual bullets in this issue.

Feel free to continue to add any QC considerations that are missing to this issue and we can migrate them into this bit of the page until we feel good about the set of issues identified there.