Working with Ping data to filter sample units (rows) down to survey transects

zhrandell commented 1 year ago

Notes for myself that I need to:

write a function to identify erroneous Ping values based on avg altitude and % confidence
use interpolation or nearest-neighbor values to replace these erroneous values
use the now-consistently <2m and accurate Ping values to select rows defined by survey transects
double check, save, and export new .csv for each individual transect

zhrandell commented 1 year ago

Okay, @m-h-williams, I've made a little progress here. There's a couple quirks about how we need to select data based on altitude such that (for now) I took a shortcut and (gasp) directly edited the .csv file to trim it down to our exact transects. This is a bit of a faux pas, but I wanted to keep us moving here, so alas.

Once we have individual .csv files for each transect, we can visualize the data, such as, e.g.,

trimmed_with_error

You can see the numerous erroneous ping altitude records. To clean these up, we can run the following:

dat <- dat %>%
  bind_rows(.)
mutate(dat$smoothed <- ifelse(dat$avg_dist > 1.5,
                         NA, 
                         dat$avg_dist))

which removes the erroneously large values and replaces them with NA, like this:

  avg_dist avg_conf smoothed
1  1.48   100.00   1.48
2  1.46   100.00   1.46   
3  4.97   100.00   NA
4  5.21   100.00   NA
5  1.43   100.00   1.43
6  1.44   100.00   1.44

We can then run na.approx() from library(zoo) to interpolate the missing values, like this:

library(zoo)
dat <- dat %>%
  mutate(smoothed = na.approx(smoothed))

NOTE that na.approx() requires real values on either end of a NA . . . a dataframe can't end (or begin) with NA for the interpolation to work.

Running na.approx() on the above data produces:

  avg_dist avg_conf smoothed
1  1.48   100.00   1.48
2  1.46   100.00   1.46   
3  4.97   100.00   1.45
4  5.21   100.00   1.44
5  1.43   100.00   1.43
6  1.44   100.00   1.44

and when applied to the whole dataframe, produces the following: trimmed_with_errors_interpolated

zhrandell commented 1 year ago

Closing this for now, though I'll note that we never came up with a true "rules based" method of filtering altitude values based on, e.g., rates of change along a rolling window. Rather, we set an arbitrary 1.5m above which values are tossed out. This is somewhat defensible given the vehicle does not exceed an altitude of 1.5m (ideally not > 1.2m) during surveys, though a more elegant solution would be preferred.

zhrandell / Seattle_Aquarium_CCR_analytical_resources

Working with Ping data to filter sample units (rows) down to survey transects #6