rvalavi / blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
https://doi.org/10.1111/2041-210X.13107
GNU General Public License v3.0
106 stars 22 forks source link

About package dependencies and importing {blockCV} functions #7

Closed pat-s closed 4 years ago

pat-s commented 4 years ago

Hi RV,

in the mlr3 ecosystem we aim to reduce dependencies as much as possible. We would like to use {blockCV} within mlr3spatiotempcv.

However, {blockCV} comes with a lot of packages in IMPORTS. Some of these again have a large dep chain themselves.

I was wondering if most of those packages could maybe be moved to the SUGGESTS section (this one is preferred since it does not put the packages into the dep chain of other packages). The "good practice" of putting most packages into the suggests section came up in the recent years. Usually it comes down to

if you use certain functions only in one/few functions in your package, put them into suggests. This way, other people depending on certain functions of your package won't have to install all the other recursive deps.

{blockCV} is a good example of an R package which comes with a core functionality which is interesting for other packages (the partitioning functions) and some other sugar (the plotting/shiny) parts, which are not.

Below I've summarized some points while reviewing your package. Addressing those will not only make it possible for us to import {blockCV} but also will make a CRAN submission more easy.

Suggests

Depends

This section should only be used if that package is used everywhere in the package and basically nothing is working without it. Attaching a package related to parallel processing is even more dangerous since it might interfere with other parallel backends and the user might not notice that the full Namespace was attached.

You only use pkg {snowfall} in spatialAutoRange() and I'd say it should live in SUGGESTS. Also, the package was not updated since 2013 and in general snow-based parallelization is not recommended anymore. It was superseded by the {parallel} package and its cluster*() functions. If you would ask me, I'd go for the {future} package when it comes to parallelization.

This is probably the most problematic thing currently and I just discovered this now. If a user installs the whole mlr3 package suite and then uses a function from {mlr3spatiotempcv} which uses {blockCV}, {snowfall} will get auto-attached and has the potential to cause a lot of trouble. I might need to take action very soon on this so I'd highly appreciate if you could remove this in the near future.

General

related: #1

Thanks again for this package and let me know if I can help in any way.

pat-s commented 4 years ago

@rvalavi

Catching up on the recent Twitter thread. Do you have plans to integrate these suggestions? If yes, is there a rough ETA for this?

rvalavi commented 4 years ago

@pat-s

You're right, most of the changes are very quick. I only need to check the sp functions to sf and the future package. Unfortunately, I just realised the existence of the future after I had implemented snow (which I regret later). The reason I put that in depends section is that I had a problem with the tests when this is not already loaded.

rvalavi commented 4 years ago

One issue might arise here is that in the spatialBlock function, the ggplot is used anyway. I can make it optional by an argument. Currently, the showBlocks only controls whether to plot it or not (the ggplot object is created by default). If make this optional and ggplot to SUGGESTS solves the problem, I'll do that.

rvalavi commented 4 years ago

The dplyr was because of sf. I had a problem with that earlier. I check it, if no problem, I'll remove it completely.

pat-s commented 4 years ago

@rvalavi The primary idea about moving packages to SUGGESTS is:

If its only used in a few (one) function of the package which is not used in the core functionality of the package, move it to SUGGESTS

So e.g. some users might want to use only "env blocking" and hence will never need the Rstoolbox / ggplot2 dep.

And this applies to pretty much all the functions in your package - they are making use of certain packages which other functions don't need :) Doing so would also reduce the dep chain for packages using blockCV by a lot.

To check if the user has installed the packages for certain functions, it is good practice to have a top level call of

if (!requireNamespace("remotes")) <do something> # usually stop and tell the user to install the package

in the function using suggested packages.

Currently, the showBlocks only controls whether to plot it or not (the ggplot object is created by default).

Yes, just don't do the plotting at all if the arg is FALSE.

The dplyr was because of sf. I had a problem with that earlier. I check it, if no problem, I'll remove it completely.

This should not be. If there is a issue with a minimum req of a {dplyr} version of {sf}, this should be tackled in {sf} (and reported there if you face issues).

Let me know if you have further questions.

rvalavi commented 4 years ago

@pat-s all done!

pat-s commented 4 years ago

Thanks! Looks very good! 😄 You also got rid of {RStoolbox} completely which prunes down the dep chain a lot.

In the future I'll tag you directly :) Good luck for the CRAN submission ;)

rvalavi commented 4 years ago

@pat-s thanks for the useful comments :) That would be appreciated as I rarely check for new messages :D

I cancelled the previous submission to CRAN. I'll submit this new version soon.

HenrikBengtsson commented 4 years ago

Drive-by comment:

@pat-s wrote:

Package {methods} does not need to be listed in IMPORTS since it is within the base R packages

I'm not sure that is correct. It is always safer/more robust to declare all imports you use, even if it is "just" things such as methods::setMethod(). The only package you don't have to declare is {base} itself. All other dependencies should be declared. Declaring "base" packages such as {methods}, {stats}, {tools}, and {utils} is free - they don't count toward any type of quota.

rvalavi commented 4 years ago

@HenrikBengtsson that's true. I received an error when I didn't declare "methods". I tried to fix it, but I didn't find any reason, so I added it. Anyway, blockCV just passed the test and it's on CRAN now. @pat-s