This PR does not close any issues. Here is the state that I am leaving this project after some smaller edits and a lot of data exploration. I would appreciate some help with the large anomaly flag function.
Added sv window to site visit flag in order to flag the following 45 minutes after a site visit
Added large anomaly flag function (help here would be appreciated)
The idea here is that this function would flag large swaths of bad data. This is done in case there is a lot of bad data followed by reasonable data in a window. We want to flag the reasonable data within that window because it has become suspect (hard to trust it after so much weirdness in the data). This is done by:
Creating the flag_binary column to track if there are flags in the flag column.
Then creating the roll_bin column which sums up the flag_binary for a 24hr rolling window, with that point as the center.
Finally this function adds the “24hr anomaly flag” to the flag column if a quarter of the data in a 24hr rolling period has flags and returns the df.
Issues I am having:
Extremely inclusive (97 data point (24 hour) rolling window).
I thought that by setting the condition flag == "24hr anomaly flag” to 0 it would make it less inclusive, but it is still too inclusive to be effective, I think.
Another idea would be to reduce the window size, but I haven’t experimented with that yet.
Altered the generating daily plots function
Added if else statements to bypass some obvious errors when iterating over all the data.
Added y_min and y_max hlines to the plots that are generated to easily discern when points are out of seasonal ranges.
Sorted the output of the function.
Added new histograms to data folder
Same as before, just updated with the new seasons that we are using and adding vlines to see how much of the data is being excluded with the the quantiles we are using (10&99).
This PR does not close any issues. Here is the state that I am leaving this project after some smaller edits and a lot of data exploration. I would appreciate some help with the large anomaly flag function.
flag_binary
column to track if there are flags in the flag column.roll_bin
column which sums up theflag_binary
for a 24hr rolling window, with that point as the center.flag == "24hr anomaly flag”
to 0 it would make it less inclusive, but it is still too inclusive to be effective, I think.