rossyndicate / poudre_sonde_network

MIT License
0 stars 7 forks source link

Working targets! #100

Closed kathryn-willi closed 7 months ago

kathryn-willi commented 7 months ago

This PR mostly moved things around, and then made some directory tweaks required for things to continue to run. I also added Sam’s “realistic” flag. At the very end, I tested the targets pipeline and it totally works – huzzah!!! The closest thing to our “final” dataset for the 2022-2023 field season is currently “data/flagged/all_data_flagged_complete.feather”. But - we still have a lot of work to do!

1) Sam’s realistic flags are very similar thresholds to our seasonal summaries. This is good, as it indicates that the values we are automatically getting from the workflow are similar to what we would expect. I would like to develop some figures that convey their similarity for our final QA/QC report. 2) Is our add_suspect_flag() working exactly as we would expect? This needs to be reviewed thoroughly. 3) We need to re-work how battery and voltage and baro are treated. I see them more as meta data than parameters. I want them to be tacked onto the meta data of all other parameters, so that each site-param combo’s data frame columns are "DT", "Site", "Parameter", "Value", "Baro", "Battery Level", "Voltage", etc. etc. 4) We still need to develop a flag to capture turbidity drift. A good place for us to start may be the driftR package. 5) Add a step at the beginning of the targets pipeline to create the api/incoming_data folder if it doesn’t exist. Then, add a step at the end of the targets pipeline to move the files in that folder to a “archive” folder where all our historical API pulls will live. 6) Depth data is still very messy since sondes were purposefully moved up and down in their housing. We will need to adjust this data in a new column (depth_adj or something) ahead of publication of the final dataset.

Juan, please review my 00_preparatory_workflow.Rmd to see the first step in creating the data that feeds the {targets} pipeline. This Rmd is where we pull in all the 2022-2023 data, remove the known instances of sensor malfunction, and develop the thresholds. It is also where I split the clean data up for {targets} testing purposes. Next, please run the {targets} pipeline to ensure it works on your end. Can’t wait to wrap this workflow up. 😄