Closed zhrandell closed 1 year ago
Okay, @m-h-williams, I've made a little progress here. There's a couple quirks about how we need to select data based on altitude such that (for now) I took a shortcut and (gasp) directly edited the .csv file to trim it down to our exact transects. This is a bit of a faux pas, but I wanted to keep us moving here, so alas.
Once we have individual .csv files for each transect, we can visualize the data, such as, e.g.,
You can see the numerous erroneous ping altitude records. To clean these up, we can run the following:
dat <- dat %>%
bind_rows(.)
mutate(dat$smoothed <- ifelse(dat$avg_dist > 1.5,
NA,
dat$avg_dist))
which removes the erroneously large values and replaces them with NA
, like this:
avg_dist avg_conf smoothed
1 1.48 100.00 1.48
2 1.46 100.00 1.46
3 4.97 100.00 NA
4 5.21 100.00 NA
5 1.43 100.00 1.43
6 1.44 100.00 1.44
We can then run na.approx()
from library(zoo)
to interpolate the missing values, like this:
library(zoo)
dat <- dat %>%
mutate(smoothed = na.approx(smoothed))
NOTE that na.approx()
requires real values on either end of a NA
. . . a dataframe can't end (or begin) with NA
for the interpolation to work.
Running na.approx()
on the above data produces:
avg_dist avg_conf smoothed
1 1.48 100.00 1.48
2 1.46 100.00 1.46
3 4.97 100.00 1.45
4 5.21 100.00 1.44
5 1.43 100.00 1.43
6 1.44 100.00 1.44
and when applied to the whole dataframe, produces the following:
Closing this for now, though I'll note that we never came up with a true "rules based" method of filtering altitude values based on, e.g., rates of change along a rolling window. Rather, we set an arbitrary 1.5m above which values are tossed out. This is somewhat defensible given the vehicle does not exceed an altitude of 1.5m (ideally not > 1.2m) during surveys, though a more elegant solution would be preferred.
Notes for myself that I need to: