cleaning & filtering GPS tracks

zhrandell commented 1 year ago

The purpose of this issue is to aggregate information pertaining to working with GPS tracks, including cleaning erroneous points, and separating GPS points associated with survey transects.

As an example, we can use the lat and lon columns in the cleaned telemetry file located here. Each row corresponds to 1s, i.e., we have telemetry data, including GPS coordinates, at the 1s scale.

If you run the code (located here) to clean and filter the GPS data, you'll notice that there's approximately a dozen GPS points with clearly incorrect readings, as they have lat and lon values across the continental United States. I removed these with:

https://github.com/zhrandell/Seattle_Aquarium_ROV_telemetry_and_mapping/blob/30a745d701a7f7c69067ea3924a6d023e2328461/code/tracklog_cleaning.R#L72-L73

(the cleaned CSV file linked above has already has these erroneous points removed).

We can then plot the remaining GPS points with

https://github.com/zhrandell/Seattle_Aquarium_ROV_telemetry_and_mapping/blob/30a745d701a7f7c69067ea3924a6d023e2328461/code/tracklog_cleaning.R#L118-L122

A screen shot from the attached map indicates that our remaining GPS points are largely spot on for our dive at Mushroom Rock, east of Cape Flattery in the western Strait of Juan de Fuca:

Notice however there are several instances of "spikes" in the GPS tracklog, where the ROV seemingly moves a large distance (relative to other step-lengths) prior to shifting back to the original position. There are almost certainly erroneous readings. I am uncertain what their source is, though it is likely an acoustic shadow, or otherwise some issue in transmitting acoustic information appropriately to the antenna array.

Whatever their source, these localized instances of erroneous GPS coordinates will need to be dealt with. The first step is to translate the GPS coordinates into realized geometric measurements, e.g., how many meters has the ROV moved each second. Once achieved, we can calculate, e.g., the average step-length the ROV takes when conducting surveys or otherwise behaving in a normal fashion (note that this would require isolating periods of normal activity). We can then filter coordinates that grossly exceed the average step-length of the ROV. This should "connect" the two accurate GPS points that flank the erroneous points on either side, producing a seamless, albeit slightly-processed, trackline.

zhrandell commented 1 year ago

Circling back here, I've used the library(trajr) package (see, e.g., here) intended for the analysis of animal trajectories derived from GPS monitoring to smooth our ROV GPS tracks slightly.

For example, with the ROV telemetry file GPS_test.csv from offshore of Brown Island in the San Juan Islands, we see the (slightly) jolted back-and-forth that characterizes normal functioning of WaterLink's GPS system, e.g., here is a Leaflet map of the raw GPS tracks:

GPS_raw .

We'll first write a function to create a couple new columns, since the current ROV telemetry file has been extracted from a previous, larger file.

prep <- function(df){
  df$clock <- seq(1, nrow(df), by=1)
  df$clock_min <- seq(1, nrow(dat), by=1)/60
  return(df)
}

this simply creates two new columns, the first being 1:nrow(dat) in units of 1, the second being 1:nrow(dat)/60, which, because the frequency interval of our telemetry file is one second, produces rows with units as minutes (the latter column might be nice to have down the road, e.g., plotting depth vs time for our transects).

clock  clock_min
1   0.01666667
2   0.03333333
3   0.05000000
4   0.06666667
5   0.08333333
6   0.10000000

We won't call the function prep() now, and instead will imbed it into our second function. First though, we need to make sure we install.packages(trajr) and run library(trajr). Once that's done, we can write and run a second, primary function:

smoother <- function(df, poly, bin){
  df <- prep(df)
  coords <- data.frame(df$lat, df$lon, df$clock)
  traj <- TrajFromCoords(coords)
  smooth <- TrajSmoothSG(traj, p=poly, n=bin)
  df$lat_smooth <- smooth$x
  df$lon_smooth <- smooth$y
  return(df)
}

df <- prep(df) invokes the first function, coords <- data.frame(df$lat, df$lon, df$clock) creates a new data frame in the precise manner required by trajr, traj <- TrajFromCoords(coords) creates a traj object, then smooth <- TrajSmoothSG(traj, p=poly, n=bin) applies a smoother based on Savitzky-Golay filter. Finally, the last two lines starting with df$lat_smooth & df$lon_smooth simply bind the newly smoothed GPS coordinates to our original data frame.

Note the inputs for smoother(df, poly, bin): df is simply the data frame, and will likely be dat is most all applications. The latter two inputs are parameters of the Savitzky-Golay filter.

poly controls parameter p, which itself stands for polynomial. The higher this number, the more "wiggly" our tracks will become because we're fitting a higher-order polynomial. In contrast, the smaller the value of p, the "straighter", and more linear the track will be. Values of 3 and 4 will likely be appropriate for our application.
bin is our input for the n parameter; I've chosen bin short for bin-width, as this parameter controls the width of values used in the smoothing process. For example, a very low number, e.g., bin=11 will produce a very choppy, almost "unsmoothed" trackline. In contrast, bin=177 produces a nice, smooth trackline that still adheres to the broad, original contours of the ROV's path. NOTE: with bin=x, x must be an ODD number or the TrajSmoothSG() function will not run. I've chose bin=177 because 177 is approximately nrow(dat)/4, and I think this 0.25(nrow(dat)) is a decent starting point for us.

We can see this by running:

output <- smoother(dat, 4, 177)

which, via the following leaflet code:

map <- leaflet() %>%
  addTiles() %>%
  addPolylines(data=dat, lat=~lat, lng=~lon, weight=1, color="black", opacity=1) %>%
  addPolylines(data=output, lat=~lat_smooth, lng=~lon_smooth, weight=2, color="red", opacity=1)

map

produces the following:

GPS_smoothed

@m-h-williams, one thing about trajr though . . . in order to calculate the length of the trajectories they need to be spatially referenced, with units specified. Right now we're simply applying a mathematical function to two numeric columns which just happen to be GPS coordinates. So, instead of futzing with the trajr distance functions, could you please try applying the code you wrote to create the column EucDIS on the new lat / lon columns lat_smooth & lon_smooth?

Thank you! And please let me know if you run into any trouble!

zhrandell commented 1 year ago

@m-h-williams, I want to add a couple more snippets of code here that may be occasionally useful to process GPS tracks. I don't think any of these functions will be used every time, but I've seen instances in the GPS data where either one or both of these snippets could be used. I anticipate it may be a case by case basis. Note that these do require we have EucDIS, the Euclidean distance traveled every second by the ROV.

First, a scenario where there is a SINGLE erroneous GPS coordinate that is, e.g., in the Atlantic ocean, and really needs to be removed (because it is, e.g., throwing off our smoothing function above). Given that the distance traveled i.e. EucDIS requires two points to make a line, we can remove the 2nd point that produced the long step-length. First we identify any rows with a long step-length:

cutoff <- 10
dat$lat[dat$EucDIS >= cutoff] <- "NA"
dat$lon[dat$EucDIS >= cutoff] <- "NA"
dat <- dat %>% drop_na(c(lat, lon))

In this case, cutoff <= 10 says any EucDIS rows 10m or greater will result in the lat and lon columns being set to NA. We simply then delete those rows.

However, as we sometimes see, there may be an erroneous point with a very long step-length, following by several points that exhibit small step-lengths in the erroneously wrong area; i.e., the GPS is "locked" into the incorrect position. In this case, if we just delete a single GPS point, when we plot the new track the map will simply connect to the second point that's still in the erroneous position. We can correct for this by deleting not only the erroneously large step-length points, but also any number of rows following those points. For example:

cutoff <- 2
dat$lat[dat$EucDIS >= cutoff] <- "remove"
dat$lon[dat$EucDIS >= cutoff] <- "remove"

now, instead of NA, we swap out the lat and lon rows with a character string remove (because NA is special and the following grep function doesn't work with it). Next, we first identify all those remove rows which becomes the vector inds (for individual cases matching our logical condition). We can then delete all inds cases AND, e.g., the row after each inds instance, i.e., inds +1. We then set both lat & lon back to a numeric vector (which remove changed).

inds <- grep("remove", dat$lat)
dat <- dat[-unique(c(inds, inds + 1)), ]
dat$lon <- as.numeric(dat$lon)
dat$lat <- as.numeric(dat$lat)

If we want to remove the next three rows, the 1 line that changes becomes:

 dat <- dat[-unique(c(inds, inds + 1, inds + 2, inds + 3)), ]

If we want to remove the next AND the preceding row, the 1 line that changes code becomes:

dat <- dat[-unique(c(inds, inds + 1, inds - 1)), ]`

There's probably a way of specifying the code such that we have a custom-sized window (instead of manually specifying each row), but as is, inds + 5 would delete the remove row and the 5th row after remove (and NOT all 5 rows after remove).

zhrandell commented 1 year ago

closing to archive

zhrandell / Seattle_Aquarium_CCR_analytical_resources

cleaning & filtering GPS tracks #2