Generating generic datasets

barnabasharris commented 2 years ago

Hi Joseph, Thought it might make sense to continue our discussion re generating generic 'movement corridor' maps for different regions. here.

I was thinking of using a randomly distributed set of start points and then producing n random destination points at various distance intervals, as in Llobera's (2015) paper: 'Working the digital: some thoughts from landscape archaeology'.

Also thought about creating a grid (or grids, based on density) of points and then calculating LCPs between neighbouring points only. So corridors between nodal locations basically.

Also perhaps worth calculating with / without rivers as barriers (and conversely, possibly rivers as highly conductive too?)

Do you have thoughts on the best cost surface algorithm etc.?

josephlewis commented 2 years ago

Hi @barnabasharris,

Thanks for the question.

(1) I had implemented the approach introduced in the Llobera (2015) in previous versions of leastcostpath (the create_banded_lcps function). If there is a want, I can implement it within the newer version of leastcostpath

(2) That could work. If I understand correctly, this is referring to the from-everywhere-to-everywhere approach introduced by White and Barber (2012) (10.1016/j.jas.2012.04.017)

(3) It's definitely worth thinking about incorporating rivers as barriers / non-barriers. I think it depends on the reasoning behind calculating generic 'movement corridors'. If the purpose is to look at just land movement, then rivers as barriers seems appropriate (though an open question is which rivers to include as barriers, etc). If looking at only water-based movement, then rivers as non-barriers and land as a barrier seems appropriate (though general movement will very likely be limited). If the purpose is to explore for example trade of goods then incorporating both land and rivers as non-barriers makes sense. Current approaches of this generally include rivers as being cost from land-based cost functions multiplied by some factor. There's definitely scope to expand this / enrich this with more detailed cost information using water-based cost functions (be that from ethnographic studies, e.g. using canoes, or some equation-based simulation).

In general, I would propose Campbell 2019 for the fact that this is based on a sample size much larger than other cost functions (assuming that you're measuring time). Though if you want to align with other archaeological studies, Tobler's hiking function might be the way to go. There are some philosophical/methodological issues with trying to choose 'one' best cost function (see here for a working paper, https://josephlewis.github.io/2022_10_04_Explanation_and_Robustness.pdf), but this shouldn't really matter for what you're trying to do.

Hope that helps. Happy to expand. I've had some ideas around this general topic in the past, just not had the time to explore properly!

barnabasharris commented 2 years ago

Hey, Cheers for that info and the links, have read around a bit now.

1) As a new {leastcostpath} user, and someone fairly new to LCPs generally, I don't have a good sense of what further functionality might be appreciated. I guess it depends on how popular the discussed approach is. It's not too tricky to whip up a script that produces the above points though so it wouldn't be a major stumbling block for me at least.

2) My thought was not quite FETE, rather more of a nearest neighbour approach (i.e. LCP is calculated from / to each grid point and it's neighbours only, rather than all other grid points). My hope was that this would reduce the total number of calculations required for very large datasets, though I admit I haven't fully considered the implications of this approach and thus it might be better to stick to a more well-trodden approach especially when attempting to generate a generic dataset. Quick technical question here -- is there any significant time saving made by setting create_lcp(..., directional=FALSE) over running two calls of create_lcp(..., directional=TRUE)?

3) I had a sense this might be an open (but interesting!) discussion. I think a purely terrestrial-based approach would make a useful start point but generating some kind of dataset which incorporates rivers as route ways would be desirable too. Especially considering the history of portage in Britain (e.g. https://onlinelibrary.wiley.com/doi/ab/10.1111/j.1468-0092.1996.tb00083.x). I'm sure one of my colleagues has produced a medieval 'navigable rivers of Britain' GIS dataset at some point which I could dig out.

Thanks for the advice on general cost functions too. I'll hopefully produce something more concrete to look at in the coming weeks.

josephlewis commented 2 years ago

It would reduce the number of calculations, but this would be based on the assumption that only those neighbourhood points would be connected. This can somewhat be argued when identifying routes between known locations (e.g. settlements) but less so when trying to identify corridors of movement within the landscape. Though as an aside you might want to look into circuitscape.

The only difference between directional = FALSE and directional = TRUE is that when FALSE the LCP is calculated from origin to destination and destination to origin. Though newer versions (i.e. from github rather than CRAN) have depreciated this functionality (largely for simplicity)

Agreed. You might find https://archaeologydataservice.ac.uk/archives/view/inlandnav_lt_2019/ useful

barnabasharris commented 2 years ago

Nice -- thanks for that. I feel like there must surely be some wiggle room from a computational demand perspective in terms of the FETE approach, however. To take an extreme example, is it worth the time computing the LCP from Aberdeen to Land's End as opposed to a point included in n neighbours?

What I was getting at in terms of the directional = FALSE question was whether, in terms of computation time, it was quicker than calling create_lcp() twice with the origin and destination points reversed? I.e. is any information recycled, or is (was) the same calculation with the points reversed simply called twice internally? Not a biggie really, just looking to optimise.

Thanks for that ADS link too.

josephlewis commented 2 years ago

I understand now - definitely some wiggle room. I guess it's whether based on your expertise you expect long-distance travel between locations x distance away, or whether this movement is from sites x distance away that happen to then connect to form long-distance routes

Ah, I see. It was calling the function twice. Nothing smart about it!

barnabasharris commented 2 years ago

Quick question re scaling this up.

The manual page for create_lcp() specifies origin and destination should be: sf of geometry type 'POINT' or 'MULTIPOINT'. My thought was that this means LCPs for multiple locations can be calculated in a single call? But the below reprex indicates that only the first geometry is used regardless of geometry type.

See following reprex:

remotes::install_github('josephlewis/leastcostpath')
library(sf)
library(elevatr)
library(dplyr)
library(terra)
library(leastcostpath)
library(purrr)
set.seed(123)

wgsLoc <- data.frame(x=-2.00715,y=51.38306) %>% st_as_sf(coords=c('x','y'),crs=4326)
origin <- st_transform(wgsLoc,27700)

dem <- 
  elevatr::get_elev_raster(wgsLoc,z = 8) %>% 
  as(.,'SpatRaster') %>% 
  terra::project(.,y = "epsg:27700")

destinations.point <- origin %>% 
  st_buffer(20000) %>% 
  st_bbox %>% 
  st_as_sfc %>% 
  st_sample(10) %>% 
  st_sf

destinations.multipoint <- 
  destinations.point %>% 
  summarise() %>% 
  st_cast('MULTIPOINT')

plot(dem)
plot(origin,add=T,col='red')
plot(destinations.point,add=T)

# destinations as single point df return lcp for first point
cs <- create_slope_cs(dem)
lcp <- create_lcp(cs, origin, destination=destinations.point)
plot(lcp$geometry,add=T)

# destinations as multipoint df return lcp for first point also
cs <- create_slope_cs(dem)
lcp <- create_lcp(cs, origin, destination=destinations.multipoint)
plot(lcp$geometry,add=T)

# can iterate over rows of single point df
system.time({
  lcps.org <- 1:nrow(destinations.point) %>% 
    map(~create_lcp(cs, origin, destination=destinations.point[.x,])) %>% 
    bind_rows()
})
>  user  system elapsed 
  6.004   0.287   6.272 

plot(lcps$geometry,add=T)

Is this the expected behaviour? I note that the final method shown above of iterating over the create_lcp() call (similar to the way create_FETE_lcps() is structured) means that the underlying igraph object is generated multiple times etc. Thus, would it be possible to optimise create_lcp() further by reducing calls to this function and making use of the igraph::shortest_paths() vectorisation capacity? Something like:

# from create_lcp()

system.time({
  x <- cs
  cs_rast <- terra::rast(nrow = x$nrow, ncol = x$ncol, xmin = x$extent[1], xmax = x$extent[2], ymin = x$extent[3], ymax = x$extent[4],crs = x$crs)

  from_coords <- sf::st_coordinates(origin)[1, 1:2, drop = FALSE]
  # to_coords <- sf::st_coordinates(destination)[1, 1:2, drop = FALSE]      # allow all rows of points df
  to_coords <- sf::st_coordinates(destinations.point)[, 1:2, drop = FALSE]

  from_cell <- terra::cellFromXY(cs_rast, from_coords)
  to_cell <- terra::cellFromXY(cs_rast, to_coords)

  cm_graph <- igraph::graph_from_adjacency_matrix(x$conductanceMatrix, mode = "directed", weighted = TRUE)

  igraph::E(cm_graph)$weight <- (1/igraph::E(cm_graph)$weight)

  lcp_graph <- igraph::shortest_paths(cm_graph, from = from_cell, to = to_cell, mode = "out")
  # lcp_cells <- unlist(lcp_graph$vpath) # need to iterate over multiple paths

  lcps.new <- 
    lcp_graph$vpath %>% 
    map(~terra::xyFromCell(cs_rast, as.integer(.x))) %>% 
    map(~sf::st_sf(geometry = sf::st_sfc(sf::st_linestring(.x)), crs = x$crs)) %>% 
    bind_rows()
})

>  user  system elapsed 
  0.491   0.000   0.491

lcps.new$geometry==lcps.orig$geometry

> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

josephlewis commented 2 years ago

Thanks for that.

The original intention was to create only one LCP from origin to destination but it makes sense to allow for multiple LCPs to be calculated if multipoints are provided. I'll work on implementing this over the next few weeks

josephlewis commented 2 years ago

@barnabasharris

Version 2.0.5 create_lcp has now been amended to allow for multiple destination points. See also #13 which extends what object classes can be used.

Hope that helps! I'll add a comment to reflect your suggestion / implementation :)

barnabasharris commented 1 year ago

Nice one!

As a side note, I also recently discovered the {cppRouting} package which is considerably faster when running multiple leastcostpath queries on igraph type objects.

I've incorporated it into my own work flow by converting the igraph graph object immediately into a cppRouting graph as part of the create cost surface function call. It does add on quite a bit of processing time (especially for larger graphs) but if the user plans of creating many thousands / millions of least cost paths then it pays off I reckon.

Cheers

josephlewis / leastcostpath

Generating generic datasets #10