Closed frousseu closed 1 year ago
Yes, you're right this is not really what was originally intended but the problem is computational efficiency. I will find a solution for the next update, but if you have any subsequent ideas then please post them here. Thanks
As an excuse for trying ChatGPT, I actually submitted the problem to it this morning and it found a really inefficient solution using two imbricated for loops (still, I was completely amazed!). I asked for a more efficient solution using cumsum and difftime, but it wasn't working properly. For now, the only thing I can think of is using a while that randomly deletes observations based on a diff until the time gap is reached. Ideally, one would optimize this to minimize the largest gap between two consecutive observations, but I'm not sure how to do that. One could also optimize for keeping the maximum number of observations. I'm not sure whether these two objectives can have different solutions. Indeed, the problem is a bit trickier than what I initially thought. If I find something I will let you know.
Hi, is there any update on this issue? I think I have a similar problem. I have GPS data with sometimes only 1 second intervals, fit_ssm produces an inf error. But if I set min.dt to 2, everything is thrown out.
I will have time to work on this issue in 1-2 weeks. In the meantime, you will have to filter your data prior to passing to fit_ssm
. In your case, a simplistic solution would be to pass every 2nd observation to fit_ssm
.
A few weeks (months?) ago I actually coded a decent solution for filtering according to a threshold, but I just need to make the code a bit more robust and clean to submit a pull request. It uses a while that removes observations until the threshold is met and it tries to minimize the maximum gap between observations. It is not the fastest solution, but it is quick enough and it is probably sufficient for any realistic dataset. I could work on this in the coming weeks if needed.
Thanks so much Ian and Francoirs! Yes it would help me a lot. Actually we are using the package here in Norway at the marine institute, not only for animal movements, but movements of underwater vehicles (such as ROVs). I can send a dataset (with data for every second), if you want to try a real dataset;) In general we would like the model to smooth out high data density areas (where the vehicle stops).
@frousseu submit the pull request when you're ready but I can worry about cleaning it up and I will need to test/tweak for computational efficiency before I'll merge it into the master branch. @MonaFuhrmann if you can share the data here then we can both use it for testing - thanks.
ROV_24.txt This is an example track (time = UTC, lat/lon in decimals ) Unfortunately I have no info on speed (so identifying thos sections where the ROV stops is challenging)
@MonaFuhrmann I am finally able to address this issue. Would you be able to post the aniMotum
-relevant code you're using so I can see how you're fitting to the ROV_24.txt data? I can guess but there are a few different ways to fit to non-Argos location data. Thanks
Hi @ianjonsen, sorry for the late reply - I am posting the code I have used and corresponding example file ROV_15_test (this is similar to ROV_24.txt). ROV_15_test.txt
`# Load libraries library(aniMotum) library(sf) library(geosphere)
track <- read.table("ROV_15_test.txt", sep='\t', header=T) track
track$date <- as.POSIXct(strptime(paste(track$Date, track$Time), '%d.%m.%Y %H:%M:%S'), tz='UTC') track$id <- rep('ROV_15_test', nrow(track)) track$lc <- rep('G', nrow(track)) names(track)[match('Longdec', names(track))] <- 'lon' names(track)[match('Latdec', names(track))] <- 'lat' track <- track[,c('id', 'date', 'lc', 'lon', 'lat')]
inspect<-fit_ssm(track, time.step=0.00277777778, min.dt=1,distlim=c(3,3),pf=TRUE) #we can see the removed observations #without fitting an SSM with the pf = TRUE argument (extremes) in keep=FALSE
ssm_fit<- fit_ssm(track, time.step=0.00277777778,model="crw", min.dt=1,distlim=c(3,3)) #time.step=predicts every 10 second, min.dt = minimum allowable time difference between observations (in seconds); distlim=lengths (min, max in meters) of outlier location "spikes"
fits <- st_transform(ssm_fit$ssm$ROV_15_test$fitted, crs = st_crs('epsg:4326')) #fits is a simple feature collection object fitted <- st_coordinates(fits) #coordinates only
predicted <- st_transform(ssm_fit$ssm$ROV_15_test$predicted, crs = st_crs('epsg:4326')) predicted <- st_coordinates(predicted)
plot(ssm_fit, what = "fitted")
plot(ssm_fit, what = "predicted") #blue=observations, gold= predicted. X = excluded values; Uncertainty is displayed as a ?? 2 SE envelope (pale gold) around estimates.; Note larger standard errors for predicted locations through data gaps.
plot(ssm_fit, "p", type = 2, alpha = 0.1) # X = excluded values; blue=observations, gold= predicted track, along with uncertainty (pale gold 95% confidence ellipses). Alpha=transparency
require(patchwork)
res.crw <- osar(ssm_fit)
(plot(res.crw, type = "ts") | plot(res.crw, type = "qq")) / (plot(res.crw, type = "acf") | plot_spacer())
class(fits) #CLASS SF attr(fits, "sf_column") fits[1:3]
data_ssm<-grab(ssm_fit, what = "predicted", as_sf = FALSE, normalise = FALSE, group = FALSE) data_ssm
write.table(fitted,file= "ROV 15_fitted.txt") #only coordinates - maybe compare to the dataframe write.table(predicted,file= "ROV 15_predicted.txt") `
Hi @MonaFuhrmann, thanks for the code and additional test data. I've pushed an update to the dev
branch that should solve your problem. You can install the dev branch via:
remotes::install_github("ianjonsen/aniMotum@dev", dependencies = TRUE)
In this update the min.dt
argument is set to 0 by default - only subsequent observations occurring at the same time get filtered out. It shouldn't be necessary to change from the default min.dt
in most cases. I've also changed some of the data prep code where prediction times that exactly match observation times were adjusted by 1 s. This failed with your data because the observations are at 1 s intervals - so all prediction and observation times matched exactly and this caused the model fit errors. The code now adjusts prediction times by 0.5 s. The following now runs without error:
require(tidyverse)
require(aniMotum)
d <- read_csv("ROV_24.txt") %>%
mutate(date = lubridate::ymd_hms(paste(date, time), tz = "UTC")) %>%
mutate(id = 1) %>%
mutate(lc = "G") %>%
select(id, date, lc, lon, lat)
## turn off estimation of rho_o (observation error correlation term)
fit <- fit_ssm(d, model = "crw", time.step = 10/3600, map = list(rho_o = factor(NA)))
plot(fit, "p")
Closing this issue - following the above change to the default min.dt
argument, and as min.dt > 0
should not generally be necessary.
The above fixes are now available in the staging branch, which is more stable than dev. Install via:
remotes::install_github("ianjonsen/aniMotum@staging", dependencies = TRUE)
Hello!
The
min.dt
argument does not seem to work as intended, at least not the way I expected it to work. Instead of keeping all observations that will be at least separated by the specified time gap, it seems to filter out all locations having neighbouring locations with the specified time gap. This discards whole blocks of consecutive observations where several could be retained if only some observations were discarded. Perhaps the filtering is done on the result of adiff
? Here is an example of what is kept win amin.dt = 72
hours. Only the ones withdiff >= min.dt
are kept, but several more could have been kept.I haven't thought yet of an efficient way to keep all observations that will respect the
min.dt
threshold.Happy holidays! François