Closed pat-s closed 4 years ago
@rvalavi it's a shame that we didn't have time to chat yesterday. I didn't know that you were the creator of the blockCV package. But at least you got a better idea of the mlr approach yesterday. Integrating you approach into mlr as well would have the advantage that your approach could be used in hundreds of different modeling approaches in a streamlined way.
@pat-s Hi Patrick,
Sorry for the late reply. I don't know how I had not received any notification about this comment! That's a great idea! Indeed we tried to create blockCV in a way that could be useful in general for spatial modelling.
Yesterday, @jannes-m had an impressive presentation in GEOSTAT workshop in Prague. It's interesting how you implemented spatial cross-validation in mlr package. I completely agree with you, these functionalities should be more available to a wider range of users.
I would be glad to collaborate with this process. I am happy to make any changes needed for this integration.
@jannes-m thank you for your great presentations yesterday. Yes, it was an intensive program and there was little chatting time. Great idea! I would be happy to help with integrating the functionality of blockCV into mlr. I try to add more approaches to the package and maybe in the future methods for handling spatiao-temporal data.
I also would be glad to hear any comments and new suggestions regarding the current state of the package.
@rvalavi Great that you also see the value here!
A collection of spatial resamling methods that are easily available to users is missing for too long already!
Your package makes a great start by providing the methods in the first place. To have even more impact, they should be integrated in frameworks like mlr
:)
Its probably easiest to have a voice conv regarding all of this. If you're up to, just write Jannes and me a mail to schedule one.
A few comments about the methods:
In mlr
we already have "blocking". It means using pre-defined indices in resampling that should not be separated during fold creation.
The user can also set a flag that completely uses the pre-defined indices for partitioning.
Your "blocking" idea is a bit different as it creates spatial blocks. To integrate it in mlr
, we would definitely need to rename it. Not sure how big this problem would be as you refer to this name in your publication most likely.
Also you require raster objects to create the spatial blocks. I see the need for it in your example in the vignette. The implementation will not be so simple as we would also need to pass a raster layer in addition to the coordinates.
The "environmental block" is already implemented in mlr
following the idea of Brenning (2012).
I am wondering how much confusion multiple spatial clustering methods with only small differences will trigger in the spatial modeling world. You also require a raster layer for the "environmental block" which makes it a bit more difficult.
Since variables with wider ranges of values might dominate the clusters and bias the environmental clustering (Hastie et al., 2009), all the input rasters are first standardized within the function.
You do the clustering on the input variables, Brenning2012 uses the coordinates. Have you ever considered the approach after Brenning2012 in detail? Is there an advantage using the coveriates instead of the coordinates for the clustering?
Should be the easiest method to implement. This could be a good starting point.
Update: We implemented all of of blockCV's resampling functions into mlr3spatiotemporal.
Things are not completely done yet. We need some examples and polish everything. I'll let you know once we are ready. Just FYI, we also support visualization.
Since we would like to release to CRAN at some point, how are your plans regarding this? Since we depend on your pkg, yours would need to go first. Otherwise we would need to copy all of your code to be able to release to CRAN (which I would like to avoid ofc).
Hi @pat-s and @be-marc
Thank you for writing the codes. The visualisation looks nice. I had a very quick look at the code, looks good, but I might be able to help you improve it.
I had a plan to push the blockCV to CRAN. I try to do this in the next couple of weeks. I also want to update all spatial functions to sf functions. I don't think this causes any problem for mlr SpCV functions.
Please let me know if you need any help.
Regards, Roozbeh
I had a very quick look at the code, looks good, but I might be able to help you improve it.
Improvements are welcome any time, just open a PR :)
I had a plan to push the blockCV to CRAN. I try to do this in the next couple of weeks. I also want to update all spatial functions to sf functions. I don't think this causes any problem for mlr SpCV functions.
Sounds good, looking forward to it :)
Sure! I will keep you updated :)
Hi @pat-s and @be-marc
After 10 days of coding, the blockCV is finally updated! I almost wrote the package from scratch. I tried to keep the package output consistent with the previous version.
Could you check the consistency with your mlr code? I will push it to CRAN as soon as you give me feedback.
FYI: the function spatialBlock now can search for evenly distributed records in training and testing folds for binary and multi-class responses.
Thanks! I'll have it on my list - though I will be busy traveling in the next two weeks so I do not know when I will have time to get to it.
No worries! All the outputs are generated in the same format and the same name. No argument has changed. So I don't think there will be any inconsistency with mlr.
For all future readers: {blockCV} functions are supported in https://github.com/mlr-org/mlr3spatiotempcv.
Hi RV (and all other authors),
thanks for this great work - a package that is very much needed in the spatial modeling community! Also it is great that you published it on Github and maintain it openly! I am the maintainer of
sperrorest
and also author ofmlr
- a modeling framework similar tobiomod2
that you show as the last example in the vignette. Whilebiomod2
is tailored towards the species distribution com, it would be great to also have the functionality in a framework addressing the whole spatial modeling com, i.e.mlr
.In
mlr
we have the "k-means clustering" approach fromsperrorest
integrated since a few months: http://mlr-org.github.io/mlr/articles/tutorial/devel/handling_of_spatial_data.htmlmlr
is together withcaret
the most popular modeling-framework package in R. It would be great if we could join efforts and integrate parts of the functionality intomlr
. Combining efforts is also one of the reasons why we decided to deprecatesperrorest
and integrate its functionality into a bigger framework that is maintained by more persons. What is your opinion on this?I am very happy to support you in this direction and help with possible issues.
For an example for using
mlr
in spatial modeling see also https://arxiv.org/abs/1803.11266.