ropensci / statistical-software-review-book

Guide for development and peer-review of statistical software
https://stats-devguide.ropensci.org
42 stars 11 forks source link

Comments on proposed Machine Learning guidelines #81

Open gilbertocamara opened 4 months ago

gilbertocamara commented 4 months ago

First of all, kudos to ROpenSci for your work! I am writing this issue from the perspective of being the lead author of sits, an end-to-end environment for ML/DL analysis of big Earth observation data. The package is on github and has a comprehensive on-line book. The book allows users to perform extended tests and experiments with medium-sized datasets that cannot be loaded in CRAN, using an additional data package. We also provide large data sets in github with training data to support comparative experiments.

The SRR guidelines for Machine Learning are quite useful and relevant. We especially concur with guideline ML2.0, which we consider important. Since guideline ML2.0 is better elaborated in ML4.0 and later in ML5.0, could one consider merging them?

In sits we use closures to implement guidelines ML2.0/ML4.0/ML5.0, as described in the Technical Annex of the on-line book. We followed the guidelines of Chapter 10 of Hadley Wickham's Advanced R. However, many R developers may not be familiar with closures and function factories. It would thus be advisable that ROpenSci provide examples on how to support these guidelines or at least provide pointers to the related literature.

We also consider that the ML guidelines place much emphasis on the separation between specification, training and test data. This point appears in ML1.0 to ML1.5 and is reinforced in ML3.0 and its sub-points. From the viewpoint of big Earth observation data, such guidelines are not relevant because most realistic training/test datasets will not fit in CRAN. The limitation of CRAN regarding the size of training/test sets has led us to provide data packages in GitHub and to write an online book that uses them. We surmise that most ML/DL packages dealing with big data need a non-CRAN documentation.

Furthermore, in the Earth observation area, the use of training and test data for model assessment is discouraged. The community has developed a specific set of best practices of quality assessment (see Olofsson et al., (2014) doi:10.1016/j.rse.2014.02.015). These "best practices" are implemented in sits.

We also missed guidelines on model tuning. As for item ML3.4 and subitems, we consider that requiring developers to provide functions for tuning hyperparameters might be a better approach, especially with deep learning.

Overall, the SRR Guidelines deserve high praise. The ROpenSci team has provided an excellent service to the community by working hard to develop them. While the guidelines are aimed at small, focused R packages, they are also relevant to larger packages such as sits.