simonmoulds / lulcc

land use change modelling in R
GNU General Public License v3.0
39 stars 20 forks source link

Package improvement: some ideas #4

Open VLucet opened 4 years ago

VLucet commented 4 years ago

This package has been a truly priceless help for my master's thesis. It has allowed me to understand the basics of land use change modelling, and served as a great introduction to more advance R features for me. I would love to give back to this package and the author(s) behind it by becoming a more substantial contributor. More than that, I really believe in the mission of the package: make LUC modeling more accessible and more transparent. After a full year of reading LUC modelling papers, I have come to think it is really needed and worth pursuing. This is why I think an awesome goal would be for this package to be part of the ROpenSci initiative. I would be very excited and motivated to start working toward that goal!

I understand this might be a lot to ask, but I am ready to put a considerable effort into this. I see so much potential with this package, it would make me sad to not try and achieve it. I see many ways to integrate other packages (R being full of powerful modelling and machine learning packages) and really think this package could become a truly major reference in LUC statistical modelling.

I have a lot of ideas to improve the package and would love to work toward implementing them:

I have done a few of these things with my "toy package" rgovcan (link), but not all of them. I would therefore welcome the opportunity to hone by development skills!

simonmoulds commented 4 years ago

Hi Valentin,

Very pleased to connect with you again, and so pleased to hear that lulcc has been helpful for your research. I'm very happy for you to be a more substantial contributor to the project - as you deduce from the commit history, I do not have much time at the moment to look at this project.

All of your ideas sound good. An idea that I've had is to possibly split the package into either two or three packages, in line with Hadley Wickham's mantra that a package should do one thing well. In line with this advice, lulcc could easily be split into a package for doing the statistical modelling, a package for the allocation, and possibly a package for validation.

Let me know what you think.

Simon

VLucet commented 4 years ago

Hi Simon, thanks for your reply, nice to connect again as well. I am happy to hear positive feedback on these :smiley:. I will get started with PRs, let me know if you wish to give me commit access as well.

Brainstorming a bit here: Splitting the package is an interesting idea. We could imagine a lulccverse made up, like you said, of at least 3 packages.

Regardless of whether or not the packages are split, an interesting approach in my opinion would be to integrate the steps in an API that would look a bit like this:

lulcc_prep(obsStack = lu_stack, varStarck = var_stack, ...) %>% # Preps and process raster data 
  lulcc_model(method = "rf", engine = "ranger") %>% # Or "brm", "glm", "nn"... could wrap parsnip routines
  lulcc_allocate(method = "clue") %>%  # other allocations engines: "dyna-clue"?
  lulcc_validate(method = c("auc", "three_maps")) # list of validation methods
# Outputs returned in a tidy way

Nothing groundbreaking here, the inspiration for this are packages of the sort of tidymodels, especially parsnip, but also frameworks like mlr3. It is true that these frameworks also do follow the design philosophy of "one package does one thing well".

In the interface example above, if the packages are split, the main stat package would take care of prep and model to produce a probability surface, wrapping as many modelling engines as we want. The two other packages would have allocate and validate as their main functions which would have support for piping and therefore building a full workflow if all packages are present. But the packages do not need to be split for the API to look like this.

I think my (humble) opinion on the question at this point in time is probably the following: because the stats and the validation package are, and will probably continue to, mainly be wrapping code from other packages, I am not sure splitting is the right approach, as these package would not stand alone easily. That being said, the allocation package is the most likely among the 3 to be bringing really new things (as the allocation part already does thanks to your C code!).

Maybe the most important mission of lullc is to provide a smart and tidy interface to running the 4 steps of standard land use change modelling: raster prepping (i.e. passing through as.data.frame), probability surface generation, allocation and validation, in one tidy package. It also feels like one such package would be easier to maintain and to use, but I am falling short of previous development experience here!

These are my thoughts - apologies for the length, trying to stay concise, but these are interesting and exciting topics!

simonmoulds commented 4 years ago

OK, that sounds good. Let's focus development within the package for now, but with a clear division between the four stages (prep, model, allocate, validate). Then perhaps we could create some experimental packages at a later date to explore the idea of splitting them more permanently. You mentioned in your previous message the need to include some tests - I completely agree, and this would probably be the first thing to do before we consider major structural changes.

I've just merged your latest pull request. I guess the next stage is to change the name of the repository from r_lulcc to lulcc?

I will give you commit access to the package. Feel free to add yourself to the Author/Maintainer entries in the DESCRIPTION file!

edzer commented 4 years ago

Splitting packages sounds cool, but is a big headache when you want to release things to CRAN: typically changing things in one sub-pkg requires you to change them in all others too, then you'd like to submit them all together but you can't do that.

VLucet commented 4 years ago

@simonmoulds we are on the same page! Thank you for your trust with commit access. I will get some branches started on unit testing files with testthat and on a better readme. For the renaming of the repo, I believe only you can do that as repo owner. @edzer thank you for the comment and advice!

simonmoulds commented 4 years ago

@VLucet: I've changed the repository name. Thanks @edzer for your wisdom - that makes sense and confirms that for now at least we will keep everything under one package. Another change we may consider further down the line is transitioning from raster to stars ;)

edzer commented 4 years ago

Let me know if you need help with the transition!

VLucet commented 4 years ago

@VLucet: I've changed the repository name. Thanks @edzer for your wisdom - that makes sense and confirms that for now at least we will keep everything under one package. Another change we may consider further down the line is transitioning from raster to stars ;)

I was thinking the same for the switch to stars. Would be great :)

edzer commented 4 years ago

Stars has now pretty solid support in mapview and tmap.