entropy of randomised shortest paths

mpadge commented 7 years ago

jed, pasting your code for future reference and adaptation here (with explicit namespaces so we don't lose our way)

## Gaussian random field with some adjustments
model <- RandomFields::RMgauss(scale=20)
x <- seq(1,200,by=1)
y <- seq(1,100,by=1)
a <- RandomFields::RFsimulate(model=model,x=x,y=y,grid=T)
r <- raster::raster(a)
a <- raster::values(r)
a <- abs(a)^2
a <- a / max(a)*4 + 1
raster::values(r) <- 1/a

#Make TransitionLayer from raster
s1 <- function(x){x[1]}
s2 <- function(x){x[2]}
tr1 <- gdistance::transition(r, s1, 8,symm=F)
tr2 <- gdistance::transition(r, s2, 8,symm=F)
tr <- (tr1 + tr2)/2
tr <- gdistance::geoCorrection(tr,type='c',multpl=FALSE)

sp1 <- c(40,50)
sp2 <- c(160,50)
Pt <- gdistance::passage(tr,sp1,sp2,theta=0,totalNet='net')

RandomFields::plot(r)
RandomFields::plot(log(Pt))

theta is the entropy parameter here (eta in the original 2009 paper). Simulated data can be fitted to empirical data simply by finding the value of theta that produces the statistically best approximation. I'll paste some more code as soon as i get a chance.

mpadge commented 7 years ago

Effects of theta/eta are really unknown and 'unpredictable' (pun intended). theta=0 defaults to standard SP, with results like this:

Pt <- gdistance::passage(tr,sp1,sp2,theta=0.001,totalNet='net')
RandomFields::plot(log(Pt))

theta_0 And randomised SP gives these kinds of results

Pt <- gdistance::passage(tr,sp1,sp2,theta=0.001,totalNet='net')
RandomFields::plot(log(Pt))

theta_0 but theta=5 gives junk. Why 0.01? Why 5? These values have neither scale nor meaning in any absolute sense. What is needed is a way to convert theta to some kind of meaningful scale. It's actually relatively easy for me (with all vector data), but for you that's much harder. Your empirical data will always be vector data, while gdistance::passage is strictly raster, so you need a way to statistically compare simulated raster data with empirical vector data.

jedalong commented 7 years ago

Hey Mark, Completely agree. If theta is the 'free entropy' what do the numbers mean? In theory it is unbounded, but that doesnt mean a lot to me. My first instinct is that theta should be defined as the ratio between the LCP cost and some measure of a time budget so for example the time between A and B. But again you could scale this in any umber of ways. I'm not sure what you mean by my empirical data will always be vector. Most of the empirical data i would be using would be raster (e.g., DEM, Slope, Remotely Sensed indicators), which would be used to derive a cost surface.

mpadge commented 7 years ago

Your cost surface is made of raster data, but what gdistance::passage is doing is effectively giving you a raster average of a very large number of individual vector trajectories (converted to relative frequencies or probabilities). To estimate theta, this Pt field has to be compared to empirical data on animal movements, and these can only be vector trajectories. Does that make sense?

jedalong commented 7 years ago

Yes absolutely correct, but I think for now, imagine we are treating the vector GPS data as being snapped to a grid. With my caribou data this means that each caribou GPS fixed can be snapped to the centre point of the 30mx 30m pixel (DEM and Landsat at this scale) it is within. This seems like a reasonable starting point for most wildlife examples I will encounter where the raster data is ~the GPS error.

With Brownian bridges (see http://onlinelibrary.wiley.com/doi/10.1890/06-0957.1/full) a 'leave-one-out' estimation procedure is used to perform a numerical estimate of the maximum likelihood value for c2. Basically you discard a fix, and then estimate the probability surface based on the two surrounding fixes, and then compute the probability associated with the discarded fix.

I have quickly done this for the fbtgUD model that i presented in Manchester see the likec2 function that i will upload shortly. I think something similar could be done for randomised shortest paths as well (and would be easier from a computational standpoint). This would give an empirical estimate for 'theta' given some known trajectory data but does nothing to solve the interpretation problem.

mpadge commented 7 years ago

Thanks Jed, that leave-one-out approach is indeed really useful. I aim to have a computationally usable interpretation of theta within the coming week, and will work with my masters student on qualitative interpretation, hopefully including mapping to some more intuitive 0-1 scale or something. I'll let you know as soon as it's up and running.

I imagine that one of the things that I aim to do would also be useful in your case, which is to relate theta (or re-scaled version thereof) to average distance travelled. In your models, you'd then just need to plug in the actual distance traversed between A and B, and theta would be scaled to that. Of course you could only do that once for an entire trajectory, otherwise you'd be sub-sampling, but that should be even easier that the leaving-one-out approach. Singular result guaranteed becuase of guaraneteed 1-to-1 relationship between theta and average distance, and statistical bounds could also readily be derived based on numbers of samples and slopes of theta-d relationships. I'm hoping for a paper on that topic alone, but it could well be greatly strengthened through additional application to your knid of "continuous" data. Just a heads up and invitation to collaborate in case you'd be interested in exploring that further.

jedalong commented 7 years ago

Cool Mark, Sounds great it would be great to collaborate on this sort of thing!

mpadge commented 7 years ago

cc'ing @karpfen into these conversations - the aforementioned masters student who's gunning along with the probabilistic router and is keen to be involved too.

jedalong commented 7 years ago

Great to have you involved Andreas

jedalong commented 7 years ago

Hey Mark and Andreas, I have picked this back up and am playing around with trying to estimate animal home ranges that account for landscape structure rather than using simple random walks as in a brownian bridge. Have you guys made any progress on ways to empirically estimate or optimize a theta parameter from some movement trajectory data? If so I would love to hear about it and would be happy to help work out code etc. if there are issues. My current code for estimating home ranges from animal tracking is terribly slow, which is because it calls the passage function from gdistance for every pair of consecutive GPS locations and then estimates the output probability surface on a really fine resolution raster. Need to come up with something easier to diagnose!

mpadge commented 7 years ago

Thanks for the heads-up Jed! Short answer: yes and no. Both of us have been off doing this package. It just uses standard shortest paths, but is massively scalable, and easily routes 1000's of points in the blink of an eye. Having done that, we are pretty much at the stage of being able to incorporate probabilistic routing in it. However ...

I did already give it a try, and it seems that the algorithm of Saerens et al (in its updated version) actually manifests some irresolvable instabilities. This might be why they don't seem to have revolutionised the world they way i expected they would have? It works well for dense graphs, but for our relative sparse street networks, it only remains stable for fairly small graphs. Nevertheless, Andi did manage to churn out a brilliant masters thesis from it that hinted at some interesting inflection behaviour with regard to the theta parameter. We'll report back soon ...

jedalong commented 7 years ago

Right on. Ok id be interested in taking a look at the thesis when all is said and done!

mpadge commented 7 years ago

It's essentially the current vignette, with Fig. 8 being the interesting entropy result.

jedalong / wildlifeTG

entropy of randomised shortest paths #1