Open mpadge opened 7 years ago
Effects of theta
/eta
are really unknown and 'unpredictable' (pun intended). theta=0
defaults to standard SP, with results like this:
Pt <- gdistance::passage(tr,sp1,sp2,theta=0.001,totalNet='net')
RandomFields::plot(log(Pt))
And randomised SP gives these kinds of results
Pt <- gdistance::passage(tr,sp1,sp2,theta=0.001,totalNet='net')
RandomFields::plot(log(Pt))
but theta=5
gives junk. Why 0.01? Why 5? These values have neither scale nor meaning in any absolute sense. What is needed is a way to convert theta
to some kind of meaningful scale. It's actually relatively easy for me (with all vector data), but for you that's much harder. Your empirical data will always be vector data, while gdistance::passage
is strictly raster, so you need a way to statistically compare simulated raster data with empirical vector data.
Hey Mark, Completely agree. If theta is the 'free entropy' what do the numbers mean? In theory it is unbounded, but that doesnt mean a lot to me. My first instinct is that theta should be defined as the ratio between the LCP cost and some measure of a time budget so for example the time between A and B. But again you could scale this in any umber of ways. I'm not sure what you mean by my empirical data will always be vector. Most of the empirical data i would be using would be raster (e.g., DEM, Slope, Remotely Sensed indicators), which would be used to derive a cost surface.
Your cost surface is made of raster data, but what gdistance::passage
is doing is effectively giving you a raster average of a very large number of individual vector trajectories (converted to relative frequencies or probabilities). To estimate theta
, this Pt
field has to be compared to empirical data on animal movements, and these can only be vector trajectories. Does that make sense?
Yes absolutely correct, but I think for now, imagine we are treating the vector GPS data as being snapped to a grid. With my caribou data this means that each caribou GPS fixed can be snapped to the centre point of the 30mx 30m pixel (DEM and Landsat at this scale) it is within. This seems like a reasonable starting point for most wildlife examples I will encounter where the raster data is ~the GPS error.
With Brownian bridges (see http://onlinelibrary.wiley.com/doi/10.1890/06-0957.1/full) a 'leave-one-out' estimation procedure is used to perform a numerical estimate of the maximum likelihood value for c2. Basically you discard a fix, and then estimate the probability surface based on the two surrounding fixes, and then compute the probability associated with the discarded fix.
I have quickly done this for the fbtgUD model that i presented in Manchester see the likec2 function that i will upload shortly. I think something similar could be done for randomised shortest paths as well (and would be easier from a computational standpoint). This would give an empirical estimate for 'theta' given some known trajectory data but does nothing to solve the interpretation problem.
Thanks Jed, that leave-one-out
approach is indeed really useful. I aim to have a computationally usable interpretation of theta
within the coming week, and will work with my masters student on qualitative interpretation, hopefully including mapping to some more intuitive 0-1 scale or something. I'll let you know as soon as it's up and running.
I imagine that one of the things that I aim to do would also be useful in your case, which is to relate theta
(or re-scaled version thereof) to average distance travelled. In your models, you'd then just need to plug in the actual distance traversed between A and B, and theta
would be scaled to that. Of course you could only do that once for an entire trajectory, otherwise you'd be sub-sampling, but that should be even easier that the leaving-one-out approach. Singular result guaranteed becuase of guaraneteed 1-to-1 relationship between theta
and average distance, and statistical bounds could also readily be derived based on numbers of samples and slopes of theta-d
relationships. I'm hoping for a paper on that topic alone, but it could well be greatly strengthened through additional application to your knid of "continuous" data. Just a heads up and invitation to collaborate in case you'd be interested in exploring that further.
Cool Mark, Sounds great it would be great to collaborate on this sort of thing!
cc'ing @karpfen into these conversations - the aforementioned masters student who's gunning along with the probabilistic router and is keen to be involved too.
Great to have you involved Andreas
Hey Mark and Andreas, I have picked this back up and am playing around with trying to estimate animal home ranges that account for landscape structure rather than using simple random walks as in a brownian bridge. Have you guys made any progress on ways to empirically estimate or optimize a theta parameter from some movement trajectory data? If so I would love to hear about it and would be happy to help work out code etc. if there are issues. My current code for estimating home ranges from animal tracking is terribly slow, which is because it calls the passage function from gdistance for every pair of consecutive GPS locations and then estimates the output probability surface on a really fine resolution raster. Need to come up with something easier to diagnose!
Thanks for the heads-up Jed! Short answer: yes and no. Both of us have been off doing this package. It just uses standard shortest paths, but is massively scalable, and easily routes 1000's of points in the blink of an eye. Having done that, we are pretty much at the stage of being able to incorporate probabilistic routing in it. However ...
I did already give it a try, and it seems that the algorithm of Saerens et al (in its updated version) actually manifests some irresolvable instabilities. This might be why they don't seem to have revolutionised the world they way i expected they would have? It works well for dense graphs, but for our relative sparse street networks, it only remains stable for fairly small graphs. Nevertheless, Andi did manage to churn out a brilliant masters thesis from it that hinted at some interesting inflection behaviour with regard to the theta parameter. We'll report back soon ...
Right on. Ok id be interested in taking a look at the thesis when all is said and done!
It's essentially the current vignette, with Fig. 8 being the interesting entropy result.
jed, pasting your code for future reference and adaptation here (with explicit namespaces so we don't lose our way)
theta
is the entropy parameter here (eta
in the original 2009 paper). Simulated data can be fitted to empirical data simply by finding the value oftheta
that produces the statistically best approximation. I'll paste some more code as soon as i get a chance.