simonmoulds / lulcc

land use change modelling in R
GNU General Public License v3.0
40 stars 21 forks source link

Getting model inputs at different time steps #1

Closed VLucet closed 4 years ago

VLucet commented 5 years ago

I believe there is a flexibility issue when it comes to handling requests of data inputs at different timesteps.

According to the tutorial, the function getPredictiveModelInputData takes on timestep t as an argument. If I understand the code and the documentation correctly it passes t to as.data.frame.ObsLulcRasterStack and as.data.frame.ExpVarRasterList which then calls the hidden function .getExpVarRasterList. This functions works by indexing the layers of the different stacks of variables in ExpVarRasterList objects.

The tutorial seems to suggest that if the timesteps I inputed when creating my ObsLulcRasterStack object are 0, 6, 12, I could also input them in my call to getPredictiveModelInputData , such as:

train.data = getPredictiveModelInputData(obs=obs.lu, 
                     ef=vars.lu, 
                     cells=part.obs.lu[["train"]], 
                     t=6) # or t = 0, or t = 12

This results is a message error (from .getExpVarRasterList) :

Error in .getExpVarRasterList(x@maps, t) : 
  invalid t: dynamic explanatory variables, but no data for this time

The problems comes from the fact that .getExpVarRasterList looks up the index of the layers in the rasterStack with index <- t + 1 and maps[[i]] <- s[[index]]. The call to the function above won't work with instances of t that do not correspond to the index of the rasterstacks.

Changing the function call to match the stacks indices...

train.data = getPredictiveModelInputData(obs=obs.lu, 
                     ef=vars.lu, 
                     cells=part.obs.lu[["train"]], 
                     t=1) # or t = 0, or t = 2

...generates the appropriate error message from as.data.frame.ObsLulcRasterStack:

no observed map at time 't': using setting t = 0 instead

Am I missing something of do you agree this should be fixed? I see three ways:

  1. Using the ranks (or index) of the timesteps in the implementation of the functions, instead of the actual t, i.e. rank(maps@t) ; and modify .getExpVarRasterList so it matches the rank of the layers in the rasterstacks with the rank in the timesteps. Indeally, we could add a slot in the object definition of ObsLulcRasterStack objects that stores that rank (or index).
  2. Asking the user to input sequences of integers as timesteps t = c(1,2,3) when they define their ObsLulcRasterStack objects, but we loose the original intent to be able to keep the number of years (or months, or whatever) between timesteps as an implicit input.
  3. Have a more seamless integration of timesteps so that both types of input t = c(0,1,2) and t = c(0,6,12) works regardless (which would be a mix of the solutions above and my preferred solution)

In addition we could consider having the fact that getPredictiveModelInputData takes on timestep t as an argument to be documented in the man page. Also the fact that the data corresponding to the first timestep is the default output.

I'd love to be able to contribute and offer a fix but I do not feel completely confident yet in about if this is indeed recognized as an issue and about which solution should be favored.

simonmoulds commented 5 years ago

Thanks for the comment, Vincent - it's really useful to have someone looking into the code. Could you confirm that the dynamic variables in your ExpVarRasterList object contains a layer for each timestep? I think this restriction is documented, although I would agree it's annoying - in the experimental version of lulcc (simonmoulds/r_lulcc2) I have removed this requirement.

If your ExpVarRasterList object is valid in the above sense then I would agree a fix is required. Your proposal (3) would seem sensible, but I would like to investigate the problem a bit further myself before going ahead - can you share a MWE?

VLucet commented 5 years ago

Thanks for the reply, Simon. I can confirm my that the dynamic variables in my ExpVarRasterList object contains a layer for each timestep - I think I followed the naming guidelines properly. Here is a small example I created to confirm my doubts:

library(lulcc)

lu1 = raster(matrix(1:2, 80, seq(50, 80, 10)))
lu2 = raster(matrix(1:2, 80, seq(50, 80, 10)))

stackoflu = stack(lu1,lu2)

va1.1 = raster(matrix(1:10, 80, seq(50, 80, 10)))
va1.2 = raster(matrix(10:19, 80, seq(50, 80, 10)))
var2 = raster(matrix(50:59, 80, seq(50, 80, 10)))

stackofvar = stack(va1.1,va1.2, var2)
names(stackofvar) = c("di_01_000", "di_01_010","cli_02")

lustack = ObsLulcRasterStack(x = stackoflu,
                             categories = c(1,2), 
                             labels = c("Built", "Cropland"),
                             t = c(0,6)) # modify this value for testing purposes

varstack = ExpVarRasterList(stackofvar)

part = partition(x = lustack[[1]], size = 0.01, spatial = TRUE)
train = getPredictiveModelInputData(obs=lustack, 
                                    ef=varstack, 
                                    cells=part[["train"]],
                                    t=1) # modify this value for testing purposes
train
VLucet commented 5 years ago

I am happy to work on a change on this if it is agreed that it needs to be fixed! 😃

simonmoulds commented 5 years ago

Valentin, I agree that a fix is required. If you have time to work on one I would be most grateful!

VLucet commented 5 years ago

I have forked the repo to write a patch. But for some reason I have been unable to re-document and re-build from source to be able to test my changes. Ive never worked with packages having compiled code, and I think the problem comes from the DLLs. I will try to work around that by sourcing the modified files instead. Any advice on building and testing locally? Here is one of the error I keep getting when documenting/building:

Error in dyn.load(dllfile) : unable to load shared object '/Users/vlucet/Documents/DocumentsMBP/GitHub/r_lulcc/lulcc/src/lulcc.so': dlopen(/Users/vlucet/Documents/DocumentsMBP/GitHub/r_lulcc/lulcc/src/lulcc.so, 6): no suitable image found. 
Did find: /Users/vlucet/Documents/DocumentsMBP/GitHub/r_lulcc/lulcc/src/lulcc.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00 
/Users/vlucet/Documents/DocumentsMBP/GitHub/r_lulcc/lulcc/src/lulcc.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
simonmoulds commented 5 years ago

Hi Valentin,

I'm not 100% sure, but you may have better luck if you delete all the shared object (src/*.so) files before attempting to build the package.

For building and testing I use the devtools package; specifically devtools::check() and devtools::build().

Let me know how you get on!

Simon

VLucet commented 4 years ago

Well this is a little embarrassing... it took a year for me to find the time (but more importantly to build my understanding of how S4 and more generally how R packages work...) to get back to this, but I have generated a pull request for you to review. I ended up not implementing (3) because in the case of a rank being equal to a timestep that is not in this rank, the input becomes ambiguous and requires the need to set up an additional argument rank in getPredictiveModelInputData. I.e. when t = c(0,2,4) the ranks are c(0,1,2) and we get into trouble. See #3