noaa-afsc / HfPlMovement

A data package for R providing access to predicted movements of ribbon and spotted seals. Maintained by @jmlondon / josh.london@noaa.gov
Other
0 stars 0 forks source link

fix/remove outlier predictions #1

Open jmlondon opened 4 days ago

jmlondon commented 4 days ago

For both ribbon and spotted seals, there are 'outlier' predictions showing up in the final movement dataset that need to be addressed.

For example, spotted seal predictions (pl_predict_pts) shows points well into the southern hemisphere

image

And ribbon seals, while remaining in the northern hemisphere, have some relatively extreme smoothed projections

image

The likely culprits worth investigating:

  1. observations in the raw data that occur outside of the initial deployment date or specified end date
  2. erroneous observations/location estimates
  3. very long time gaps between observed locations -- this is the most likely scenario and worth looking at some recent code from Devin Johnson to remove these time gaps by splitting into separate segments
jmlondon commented 1 day ago

@emchuron and I had a discussion about two approaches for handling long time gaps between observed locations

  1. A priori split the sequence of observed locations into separate segments ... this would be done based on a specific maximum gap (e.g. 7 days) between observed locations before a new segment is designated. Each segment would be fit and predicted (and pseudo tracks generated) independently before merging back as needed
  2. Fit the complete track and rely on post hoc identification of time gaps ... after fitting based on the complete set of observations we can identify gaps as before. Predictions and pseudo tracks are only generated for the periods outside of the identified gaps

The initial consideration was to focus on the first because it seemed easier to implement and might result in better predictions/pseudo tracks because the gap periods wouldn't have influence on the model fit. After some experimentation though, this approach leads to short segments that may not converge during the model fit. Imagine you might have a stretch of 8 days of no observations followed by 7-10 locations and then another 8 day gap. Fitting a model to just those 7-10 locations can be unreliable.

In most cases, the large time gaps are not resulting in poor model fits or convergence issues. Instead the problem comes on the prediction side when large correlated loops are generated that are unrealistic.

So, I think the second approach is the path worth pursuing and here's what's needed to accomplish that

emchuron commented 1 day ago

sounds good to me! Let me know if I can help at all