nationalparkservice / QCkit

QCkit provides useful functions for data quality control and manipulation including updating data to DarwinCore standards, unit conversions, and data flagging.
https://nationalparkservice.github.io/QCkit/
The Unlicense
5 stars 6 forks source link

Sarah comments #6

Open wright13 opened 1 year ago

wright13 commented 1 year ago
RobLBaker commented 1 year ago

1) Good idea. First I'd want to get a good idea of what sort of checks would be most broadly useable and how people structure their data to be able to implement this well. Is there a standard (or at least common) way data are structured and checks are run? 2) Great idea. We should chat about those. 3) Yes, we are hoping people will adopt darwinCore naming conventions, although it's by no means required 4-x: vectorizing - yes, I'll put that on the list! 5) te_check - put that in the documentation for the function. 6) Cool! do share.

wright13 commented 1 year ago
  1. Lots of QC checks are going to be pretty dataset-specific, but we could start people off by loading the skimr and/or dlookr packages and running some of the basic summaries included in those packages. And maybe include some code snippets for reading data from SQL, Access, and/or AGOL. I think it could also be helpful to come up with several common categories of QC checks (e.g. missing data, outliers, nonsensical values, spatial data) and put those into a sample outline with options to organize by SOP. I don't think this is a template that we can expect to work right out of the box, but hopefully suggesting some tools and structure will lower the barrier to reproducible QC.
  2. Sweet, let's find a time. If we come up with a rough draft of contribution guidelines, we could post it in the data sci CoP for feedback. It's also a good place to get feedback on a QC template.
  3. I will try to get my utm -> lat/long code updated and shared this week. Shouldn't take long, in theory...