rossyndicate / poudre_sonde_network

MIT License
0 stars 8 forks source link

Flagging data for believable range for CLP #97

Closed SamStruthers closed 8 months ago

SamStruthers commented 9 months ago

I have been making some summary tables for PWQN sites and have noticed that new sites (Tamasag and River Bluffs) have a bunch of erroneous data getting past QAQC. A lot of these data seem to be related to when a sonde was buried in sediment or out of the water. Below is an example of pH data in April from Tamasag (points) and Legacy (black line). Some of the data is successfully being flagged (blue) but a considerable amount is getting past (red) Tamasag_pH_04_23_flags

My guess is that because these sites are newer, the seasonal flag is rendered ineffective and since is does not violate the slope or sd flags, these data are allowed to pass. Possible solution: We could create Poudre wide sensor spec ranges based on our four years of data and create some simple bounds for some parameters. ie pH in the Poudre is generally between 6 and 10 across all sites Alternatively, these data just need to be flagged manually and these issues should be resolved for future years when the seasonal flag has better bounds.

kathryn-willi commented 9 months ago

I like the idea of creating a flag from parameter bounds based on our understanding of the system as you suggest here! We could call it a "realistic" flag haha. To run with this example here the flag would be if pH >= 10 | pH < =6, then "outside realistic range"

SamStruthers commented 9 months ago

Did some investigating into this and found three parameters that I think this could be valid for: pH, SC and temp. Overall, I have learned that most of these data occur when the sonde is out of the water but is either missed by site visit flag, was recording during storage or was not properly deployed in the stream.

pH is definitely the most obvious and I'm proposing the bounds 6 to 10. The plots below show all the pre 2023 data (raw) and 2023 data (current QAQC applied). We could be more aggressive on the bounds (see timeline) but hopefully this flag will cause the slope flag to actually flag more data. pH_real_range pH_real_timeline For SC, I'm proposing the bounds 30 to 2500. Few sites in the PWQN get above 2200 unless they are logging when being stored with pH 4. However, boxcreek (ag trib) gets above 2200 so I think 2500 is conservative enough to remove the really erroneous data while not removing possible data. On the lower end, our most upstream sites usually hover around 40 SC. I also checked the City's sonde data up the canyon (final plot in the SC section) and it never got below 30 unless there was an outlier. SC_real_range SC_real_timeline

Manners_Cond

For Temp, I am proposing the bounds -5 C to 30 C. I opted to make the lower bounds the same as the sensor spec range but we could certainly consider making it 0 C if we wanted to. Our current QAQC seems to be good at removing this data already but may be a good thing to add for cleaning up former years of data (fewer/no field notes). temp_real_range temp_real_timeline

Thoughts? My thought for implementation is to add this anywhere where add_spec_flag is used. This appears to be the following sections of the targets workflow: make_threshold_table and all_data_flagged