Closed mikemahoney218 closed 1 month ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
Another quick note -- I will not be able to transfer this repository to rOpenSci if accepted. I had asked on Slack and was told by Yani that this was acceptable, though it's only mentioned in the book here. I want to flag this at the start, in case it turns out to be an issue!
git hash: 694d2a5f
Important: All failing checks above must be addressed prior to proceeding
(Checks marked with :eyes: may be optionally addressed.)
Package License: Apache License (>= 2)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:---------------|------:|
|internal |base | 175|
|internal |rsi | 34|
|internal |methods | 3|
|internal |stats | 2|
|internal |tools | 2|
|imports |rlang | 10|
|imports |terra | 10|
|imports |rstac | 5|
|imports |glue | 4|
|imports |future.apply | 2|
|imports |httr | 1|
|imports |sf | 1|
|imports |jsonlite | NA|
|imports |lifecycle | NA|
|imports |proceduralnames | NA|
|imports |tibble | NA|
|suggests |progressr | 1|
|suggests |curl | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|suggests |withr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
names (20), c (19), class (10), lapply (9), length (8), vapply (8), args (6), formals (6), list (6), mget (6), tryCatch (5), file.path (4), for (4), max (4), min (4), options (4), tempfile (4), ifelse (3), url (3), all (2), call (2), character (2), drop (2), eval (2), is.null (2), nrow (2), paste (2), paste0 (2), unlist (2), with (2), col (1), data.frame (1), dirname (1), get (1), grep (1), mapply (1), merge (1), ncol (1), numeric (1), readLines (1), replicate (1), seq_len (1), source (1), str2lang (1), suppressWarnings (1), t (1), tempdir (1), tolower (1), toupper (1), vector (1)
build_progressr (5), spectral_indices (3), extract_urls (2), remap_band_names (2), alos_palsar_mask_function (1), calc_scale_strings (1), calculate_indices (1), check_indices (1), check_type_and_length (1), default_query_function (1), download_web_indices (1), filter_bands (1), filter_platforms (1), get_alos_palsar_imagery (1), get_dem (1), get_landsat_imagery (1), get_naip_imagery (1), get_rescaling_formula (1), get_sentinel1_imagery (1), get_sentinel2_imagery (1), get_stac_data (1), is_pc (1), landsat_mask_function (1), maybe_sign_items (1), set_gdalwarp_extent (1), spectral_indices_url (1)
arg_match (4), caller_env (2), warn (2), exec (1), new_environment (1)
rast (5), sprc (2), crs (1), nlyr (1), predict (1)
assets_url (2), items_datetime (2), stac_search (1)
glue (4)
is (3)
future_lapply (2)
predict (1), setNames (1)
file_ext (1), R_user_dir (1)
user_agent (1)
progressor (1)
st_bbox (1)
base
rsi
rlang
terra
rstac
glue
methods
future.apply
stats
tools
httr
progressr
sf
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 15 files) and - 1 authors - 3 vignettes - 5 internal data files - 11 imported packages - 21 exported functions (median 19 lines of code) - 62 non-exported functions in R (median 15 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 3| 92.4| | |files_tests | 9| 89.6| | |loc_R | 1437| 77.1| | |loc_vignettes | 453| 75.7| | |loc_tests | 853| 84.6| | |num_vignettes | 3| 94.2| | |data_size_total | 26424| 76.4| | |data_size_median | 4831| 74.3| | |n_fns_r | 83| 71.4| | |n_fns_r_exported | 21| 68.8| | |n_fns_r_not_exported | 62| 73.1| | |n_fns_per_file_r | 3| 55.1| | |num_params_per_fn | 7| 85.3| | |loc_per_fn_r | 15| 46.1| | |loc_per_fn_r_exp | 19| 44.7| | |loc_per_fn_r_not_exp | 15| 49.5| | |rel_whitespace_R | 10| 63.3| | |rel_whitespace_vignettes | 32| 76.2| | |rel_whitespace_tests | 13| 73.5| | |doclines_per_fn_exp | 51| 64.2| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 36| 59.4| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/Permian-Global-Research/rsi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Permian-Global-Research/rsi/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 8486854009|Lock Threads |success |694d2a | 156|2024-03-30 | | 8482543797|pages build and deployment |success |1de384 | 65|2024-03-29 | | 8482287996|pkgdown |success |694d2a | 135|2024-03-29 | | 8482287992|R-CMD-check |success |694d2a | 131|2024-03-29 | | 8482287990|R-CMD-check-hard |success |694d2a | 131|2024-03-29 | | 8482287998|test-coverage |success |694d2a | 131|2024-03-29 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_stac_data | 44 stack_rasters | 28 check_type_and_length | 25 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 127 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 4 Lines should not be more than 80 characters. | 123
:heavy_multiplication_x: The following 2 function names are duplicated in other packages: - - `calculate_indices` from ClusterStability - - `sign_planetary_computer` from rstac
|package |version | |:--------|:--------| |pkgstats |0.1.3.11 | |pkgcheck |0.1.2.21 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
Guessing covr fails due to not setting my custom is_covr
environment variable:
https://github.com/Permian-Global-Research/rsi/commit/c62e6e9aa08c06a54094edc6c3df994260d8aa17
This environment variable is used to skip a test on my covr CI. The tl;dr is that rsi executes some code in a minimal environment to protect against malicious code downloaded from the internet, which prevents covr from injecting its tracking inside of that minimal environment. I still want the file to be tested (and the other pieces of the file to be counted in coverage), though, so I wrapped the local environment section in nocov
and added this environment variable.
You can see my code coverage report at https://app.codecov.io/gh/Permian-Global-Research/rsi and my CI workflow for this at https://github.com/Permian-Global-Research/rsi/blob/main/.github/workflows/test-coverage.yaml
@ropensci-review-bot assign @jhollist as editor
Assigned! @jhollist is now the editor
@mikemahoney218 Been swamped these last few days. I will work on digging through this today and tomorrow and get back to you soon and should hopefully be ready to start finding reviewers.
I am looking forward to this review. Does look like an interesting package!
No worries, and thanks for the update!
@ropensci-review-bot check rsi
I'm sorry human, I don't understand that. You can see what commands I support by typing:
@ropensci-review-bot help
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
Editor check started
:wave:
Just want to flag that I'm still expecting (your version of) covr to fail, due to https://github.com/ropensci/software-review/issues/636#issuecomment-2028086514
The core issue is that calculate_indices()
is basically intended to run code from a random site on the internet -- a trusted site, but still a security risk. As such, that downloaded code is run inside a minimal environment that prevents injecting any unexpected code or functions. Unfortunately, covr works by injecting unexpected functions into your source code, and then counting how many times its functions get run -- then calculate_indices()
doesn't allow those to execute, causing the function to fail.
As a result, I toggle the tests that hit this code path using a custom is_covr
variable, which isn't set by your covr check (because I just made it up, I don't know of a supported way to do this) and so your covr check fails.
I don't want to disable the whole .R file from covr, because covr can instrument the rest of the file, and I don't want to drop these tests (or make them off by default) because I'd like R CMD check
to check this function. I've got a live coverage report running via GHA and hopefully viewable at:
https://app.codecov.io/gh/Permian-Global-Research/rsi?branch=main
git hash: e71186f2
Important: All failing checks above must be addressed prior to proceeding
(Checks marked with :eyes: may be optionally addressed.)
Package License: Apache License (>= 2)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:---------------|------:|
|internal |base | 175|
|internal |rsi | 34|
|internal |methods | 3|
|internal |stats | 2|
|internal |tools | 2|
|imports |rlang | 10|
|imports |terra | 10|
|imports |rstac | 5|
|imports |glue | 4|
|imports |future.apply | 2|
|imports |httr | 1|
|imports |sf | 1|
|imports |jsonlite | NA|
|imports |lifecycle | NA|
|imports |proceduralnames | NA|
|imports |tibble | NA|
|suggests |progressr | 1|
|suggests |curl | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|suggests |withr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
names (20), c (19), class (10), lapply (9), length (8), vapply (8), args (6), formals (6), list (6), mget (6), tryCatch (5), file.path (4), for (4), max (4), min (4), options (4), tempfile (4), ifelse (3), url (3), all (2), call (2), character (2), drop (2), eval (2), is.null (2), nrow (2), paste (2), paste0 (2), unlist (2), with (2), col (1), data.frame (1), dirname (1), get (1), grep (1), mapply (1), merge (1), ncol (1), numeric (1), readLines (1), replicate (1), seq_len (1), source (1), str2lang (1), suppressWarnings (1), t (1), tempdir (1), tolower (1), toupper (1), vector (1)
build_progressr (5), spectral_indices (3), extract_urls (2), remap_band_names (2), alos_palsar_mask_function (1), calc_scale_strings (1), calculate_indices (1), check_indices (1), check_type_and_length (1), default_query_function (1), download_web_indices (1), filter_bands (1), filter_platforms (1), get_alos_palsar_imagery (1), get_dem (1), get_landsat_imagery (1), get_naip_imagery (1), get_rescaling_formula (1), get_sentinel1_imagery (1), get_sentinel2_imagery (1), get_stac_data (1), is_pc (1), landsat_mask_function (1), maybe_sign_items (1), set_gdalwarp_extent (1), spectral_indices_url (1)
arg_match (4), caller_env (2), warn (2), exec (1), new_environment (1)
rast (5), sprc (2), crs (1), nlyr (1), predict (1)
assets_url (2), items_datetime (2), stac_search (1)
glue (4)
is (3)
future_lapply (2)
predict (1), setNames (1)
file_ext (1), R_user_dir (1)
user_agent (1)
progressor (1)
st_bbox (1)
base
rsi
rlang
terra
rstac
glue
methods
future.apply
stats
tools
httr
progressr
sf
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 15 files) and - 1 authors - 3 vignettes - 5 internal data files - 11 imported packages - 21 exported functions (median 19 lines of code) - 62 non-exported functions in R (median 15 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 3| 92.4| | |files_tests | 9| 89.6| | |loc_R | 1437| 77.1| | |loc_vignettes | 499| 78.0| | |loc_tests | 853| 84.6| | |num_vignettes | 3| 94.2| | |data_size_total | 26424| 76.4| | |data_size_median | 4831| 74.3| | |n_fns_r | 83| 71.4| | |n_fns_r_exported | 21| 68.8| | |n_fns_r_not_exported | 62| 73.1| | |n_fns_per_file_r | 3| 55.1| | |num_params_per_fn | 7| 85.3| | |loc_per_fn_r | 15| 46.1| | |loc_per_fn_r_exp | 19| 44.7| | |loc_per_fn_r_not_exp | 15| 49.5| | |rel_whitespace_R | 10| 63.3| | |rel_whitespace_vignettes | 33| 79.8| | |rel_whitespace_tests | 13| 73.5| | |doclines_per_fn_exp | 51| 64.2| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 36| 59.4| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/Permian-Global-Research/rsi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Permian-Global-Research/rsi/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 8572262292|Commands |skipped |bc9e46 | 46|2024-04-05 | | 8639486278|Lock Threads |success |e71186 | 168|2024-04-11 | | 8575408691|pages build and deployment |success |f75519 | 69|2024-04-05 | | 8575285598|pkgdown |success |e71186 | 140|2024-04-05 | | 8575285595|R-CMD-check |success |e71186 | 135|2024-04-05 | | 8575285599|R-CMD-check-hard |success |e71186 | 135|2024-04-05 | | 8575285597|test-coverage |success |e71186 | 135|2024-04-05 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_stac_data | 44 stack_rasters | 28 check_type_and_length | 25 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 127 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 4 Lines should not be more than 80 characters. | 123
:heavy_multiplication_x: The following 2 function names are duplicated in other packages: - - `calculate_indices` from ClusterStability - - `sign_planetary_computer` from rstac
|package |version | |:--------|:--------| |pkgstats |0.1.3.11 | |pkgcheck |0.1.2.21 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
@mikemahoney218 Thanks for the update. And as you expected the bot checks fail. I am not too worried about that given you have good coverage and can demonstrate it with the codecov reports. We will want to make sure that this coverage is reported out in the repositories README. You may already be doing that, I just haven't checked yet!
Stay tuned, I am working on this today and tomorrow and expect to move on to finding reviewers shortly after that!
I think we are ready to pass on to reviewers! Nice Job!
Only one very small request:
I like to see a more upfront CONTRIBUTING. I always get lost trying to find them when embedded inside .github
. A simple link to that file should suffice!
Also, I have no concerns about the tests failing on the bot. You have implemented them well, the coverage is good, and the badge makes it easy to find.
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/636_status.svg)](https://github.com/ropensci/software-review/issues/636)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
Added the link to CONTRIBUTING!
@ropensci-review-bot assign @mdsumner as reviewer
@mdsumner added to the reviewers list. Review due date is 2024-05-09. Thanks @mdsumner for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@mdsumner: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot set due date for @mdsumner to 2024-05-24
Review due date for @mdsumner is now 24-May-2024
@mikemahoney218 Just wanted to quickly touch base. I am still working on finding a 2nd reviewer. Hopefully I will hear back with a positive response soon from my latest inquiry. Sorry for the delay!
@ropensci-review-bot set due date for @mdsumner to 2024-06-19
Review due date for @mdsumner is now 19-June-2024
@ropensci-review-bot assign @OldLipe as reviewer
@OldLipe added to the reviewers list. Review due date is 2024-07-03. Thanks @OldLipe for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@OldLipe: If you haven't done so, please fill this form for us to update our reviewers records.
:calendar: @OldLipe you have 2 days left before the due date for your review (2024-07-03).
@ropensci-review-bot set due date for @OldLipe to 2024-07-09
Review due date for @OldLipe is now 09-July-2024
@mdsumner and @OldLipe, hoping to get these reviews in so that @mikemahoney218 can start working on revisions. Hopefully next week works for you both to submit the reviews.
Please let me know if you have any questions.
Hello, I'm very sorry! My review is in draft form and I'm still working on it. Recent weeks really got away from me for various reasons, I'm currently making progress and hoping to have a first response in coming days.
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 5
Thank you for the opportunity to review the rsi
package. This package provides valuable functions to downloading satellite imagery using STAC catalogs.
I conducted the review from the perspective of both an end-user and a developer. I believe that both audiences should be addressed by the package.
To reproduce, I used the version 1.4.2-3
of Docker image of the SITS package, with R version 4.3.0 and GDAL 3.0.4.
The documentation is comprehensive and covers all functions of the rsi
package, including examples of how to use them.
1- I have only one suggestion regarding the parameters gdalwarp_options
and gdal_config_options
, which have very large default values. This somewhat clutters the code documentation. Could these GDAL values be stored in a configuration file, such as .json
or .yaml
, or even as environment variables? Using external configuration files allows for better organization of the code, and thus, you can remove these two parameters from your functions.
All tested examples worked correctly. However, I have some questions that I listed below:
1- In the example of the alos_palsar_mask_function()
, two warnings appeared. Listed below:
aoi <- sf::st_point(c(-74.912131, 44.080410))
aoi <- sf::st_set_crs(sf::st_sfc(aoi), 4326)
aoi <- sf::st_buffer(sf::st_transform(aoi, 5070), 100)
palsar_image <- get_alos_palsar_imagery(
aoi,
start_date = "2021-01-01",
end_date = "2021-12-31",
mask_function = alos_palsar_mask_function
)
> Warning messages:
> 1: In CPL_gdalwarp(source, destination, options, oo, doo, config_options, :
> GDAL Message 1: Setting nodata to nan on band 2, but band 1 has nodata at nan. The > TIFFTAG_GDAL_NODATA only support one value per dataset. This value of nan will be used for all bands > on re-opening
> 2: In CPL_gdalwarp(source, destination, options, oo, doo, config_options, :
> GDAL Message 1: Setting nodata to nan on band 3, but band 1 has nodata at nan. The > TIFFTAG_GDAL_NODATA only support one value per dataset. This value of nan will be used for all bands > on re-opening
Is this behavior expected? Additionally, in this example, wouldn't it be more appropriate to save the image in a temporary directory? From a user's perspective, these warnings might scare new users.
2- The example in the calculate_indices()
function presents an error. I understand that you are demonstrating an example that does not work. Users sometimes enter the documentation, copy the code, execute it, and then test it. Instead of the error, perhaps providing more examples of how to use the function would be helpful.
example_indices <- filter_platforms(platforms = "Sentinel-1 (Dual Polarisation VV-VH)")[1, ]
example_indices$formula <- 'system("echo something bad")'
try(
calculate_indices(
system.file("rasters/example_sentinel1.tif", package = "rsi"),
example_indices,
tempfile(fileext = ".tif")
)
)
> Error in system("echo something bad") : could not find function "system"
3- The functions that access catalogs: get_stac_data()
, get_sentinel1_imagery()
, get_sentinel2_imagery()
, get_landsat_imagery()
, get_naip_imagery()
, get_alos_palsar_imagery(),
get_dem()
share the same documentation. This is indeed an excellent solution and saves time for both the user and the developer. To highlight the features of the package, one recommendation is to include new examples for other types of data, such as radar, since the names of the assets vary for each collection.
The code is well organized and well developed. Additionally, it has excellent coverage. I have some suggestions regarding the default values of certain parameters in your functions.
1- By default, the functions that download satellite images use random names for the images. I believe that using random names for satellite images could pose challenges for end users.
Here are a few points to consider:
I understand that some providers do not standardize the names of assets. However, your implementation allows you to create a standard filename for each collection, given that there are separate functions for each. I believe a good default approach is to maintain the original asset names from the providers.
2- I find the idea of simplifying the use of masks and allowing users to insert a new function quite interesting. However, I have some questions about the default values that were used:
In the sentinel2_mask_function()
function, I would like to understand why the value 2
(shadows
) is kept as a non-masked value?
Regarding Landsat, I believe there could be improvements in filtering the values to be retained. Additionally, the implementation could be easier if working with bit values rather than integers. With integers, we are limited to only a few combinations.
Consider the values for the QA_PIXEL band:
0 : "Missing data"
1 : "Dilated Cloud"
2 : "High confidence cirrus"
3 : "High confidence cloud"
4 : "High confidence cloud shadow"
5 : "High confidence snow cover"
6 : "Clear"
7 : "Water"
8 : "Low/High confidence of cloud"
9 : "Medium/High confidence of cloud"
10: "Low/High confidence of cloud shadow"
11: "Medium/High confidence of cloud shadow"
12: "Low/High confidence of snow"
13: "Medium/High confidence of snow"
14: "Low/High confidence of cirrus"
15: "Medium/High confidence of cirrus"
Using bits allows for a wide range of combinations to filter values. One package that implements this approach is SITS. SITS produces official land use and land cover data for the Brazilian states. See an implementation example using bit wise operator:
interp_values <- c(0, 1, 2, 3, 4, 5, 9, 11, 13, 15)
bitwAnd(values, sum(2^interp_values))
Here, values
represent the image values. This method makes it easier to filter the numerous combinations that may exist when using this band.
I find these filtering functions interesting, but I believe they may not be scalable. For instance, you provide two filtering functions: filter_platforms()
and filter_bands()
. What if a user wants to filter by cloud percentage within items? Or apply more complex filters to the properties of each item?
I understand that you provide an item_filter_function
parameter, but in all these filtering functions, you filter values using vapply
. The rstac
package includes a function items_filter()
to facilitate item filtering. I believe using this function would simplify the approach to applying more complex filters based on items.
I imagine that in the future your calculate_indices()
function will support expressions through the dots parameter (...
). I'd like to provide an example of implementation from the SITS package. The sits_apply()
function allows users to generate indices using a provided high-level expression, like this:
sits_apply(NDVI_norm = (NDVI - min(NDVI)) / (max(NDVI) - min(NDVI)))
This approach enables users to utilize native R functions such as max
, min
, and mean
.
Thank you so much for your review @OldLipe ! You raise a number of great points which will take me a little bit to respond to. I just wanted to quickly respond to one thing:
I find these filtering functions interesting, but I believe they may not be scalable. For instance, you provide two filtering functions: filter_platforms() and filter_bands(). What if a user wants to filter by cloud percentage within items? Or apply more complex filters to the properties of each item?
Just to be clear, those functions are filtering the spectral_indices()
data frame based on the platforms or bands a user has access to.
rsi supports using CQL2 to filter STAC collections, for instance to filter by cloud cover. CQL2 is the preferred approach for complex filters.
That said, the items_filter_function
argument accepts rstac::items_filter()
-- for instance, to filter on cloud cover, you could pass \(items, ...) rstac::items_filter(items, properties$
eo:cloud_cover< 10)
to that argument. This is less efficient than CQL2, though.
Thanks again for your review, and I'll put together a more comprehensive response in the next week or so! :smile:
@mdsumner just pinging to see if you are finished with the review yet.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 7
I really like this package, I see a lot of familiar experiences and the result is a good and consistent set of functions for navigating this space. I haven't personally done anything with spectral indices, usually I expore imagery and how "it looks". I appreciate having all this ease-of-use tooling at hand, and I will be pointing colleagues directly to this package, and I hope to stay involved in at least small ways.
Please make mention somewhere that "where the compute runs" (i.e. what "local" means) is important here. There's an issue with performance in different regions - and some pointers to how one might run in a region closer to where the usual data sources are (us-west2 for example. I'm not suggesting that's an easy topic to cover, just would like to see it mentioned. It takes about 2x as long to run examples here in Tasmania, vs computers in us-west (I know that comparison is a lot more complex than this). It might be an idea to point to ways of running computers on public systems that run closer to the data, or at least a guide to that.
I'm a bit disappointed in our community in that packages in R have tended towards doing "everything", and (not a criticism of this package) here we are doing a lot of things: getting file sources from STAC (a json web query), configuring for authentication, downloading specific files (a curl task), mosaicing and standardizing imagery (a GDAL task). I like that the top level functions are decomposed in this package itself, and the documentation is clear as to what underlying functions are used, and I can understand why parts of "foundational" packages are used with a mix of underlying generic tools for this complex stack of work. I'm massively impressed with how much this package actually does, and exposes in a general way down to the level of templated types of functionality and details like the GDALWarp options.
rsi is a little bit like a datacube, but with exposure of the underlying components at key steps. I really like how the asset is a lightly classed object with other functions that it needs stored as attributes, and that that is how the remaining arguments to the getter function is structured, so it seamlessly inherits each level but also remains flexible to user-changes in that one call.
I'd like see some validations of the raster values obtained in some examples, if there was an external example that's documented elsewhere (in a python notebook, or another R package) it would be neat to have a clear comparison of the scale/s shown be these raster files obtained here against an independent example. (I had intended to do this myself ...)
I was looking for an example where I can make a true colour image with RGB, I don't understand the scaling that occurs in the examples (we have 0-1 scale numbers, which don't work with terra::plotRGB or its stretch argument off the shelf, and leave me unclear about the scaled ranges and whether I am plotting an RGB image correctly. The first example in the readme plots an RGB image as separate bands, and I'm unclear what to do to plot that as an "image". Again, I had meant to provide an actual example here. I believe the advice should be:
plotRGB(stretch(x))
## note that this provides different defaults to stretch() itself
plotRGB(x, stretch = "lin")
but also, it's open to interpretation and expert use afaics.
Two of these three files have no data in the aoi region.
qfiles <- get_landsat_imagery(aoi,
start_date = "2022-06-01", end_date = "2022-06-30",
composite_function = NULL)
I tried this naive thing, to run rsi_query_api, and I think at the least it should error fast when not getting a bbox in longlat.
rsi_query_api( aoi,
+ start_date = "2022-06-01",
+ end_date = "2022-06-30",)
Error in rsi_query_api(aoi, start_date = "2022-06-01", end_date = "2022-06-30", :
argument "stac_source" is missing, with no default
> rsi_query_api( aoi, stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+ start_date = "2022-06-01",
+ end_date = "2022-06-30",)
Error in rsi_query_api(aoi, stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/", :
argument "collection" is missing, with no default
> rsi_query_api( aoi, stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+ start_date = "2022-06-01",
+ end_date = "2022-06-30", collection = "sentinel2-c1-l2a")
Error in Ops.sfg(bbox[[2]], bbox[[4]]) :
operation > not supported for sfg objects
> rsi_query_api( sf::st_bbox(aoi), stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+ start_date = "2022-06-01",
+ end_date = "2022-06-30", collection = "sentinel2-c1-l2a")
Error in rsi_query_api(sf::st_bbox(aoi), stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/", :
argument "limit" is missing, with no default
> rsi_query_api( sf::st_bbox(aoi), stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+ start_date = "2022-06-01",
+ end_date = "2022-06-30", collection = "sentinel2-c1-l2a", limit = 100)
it could detect that the bbox input is not in longlat and 1) error or 2) just transform it.
It's often useful to buffer your aoi object slightly, on the order of 1-2 cell widths, in order to ensure that data is downloaded for your entire AOI even after accounting for any reprojection needed to compare your AOI to the data on the STAC server.
Here I would rather reproject the extent, this is not so hard to do and exists in a few places raster::projectExtent, reproj::reproj_extent, and terra::project but possibly the best option is to use GDAL warp (via sf as elsewhere here) to reproject an empty raster. (Happy to follow up to illustrate).
This function can either download all data that intersects with your spatiotemporal AOI as multiple files (if composite_function = NULL), or can be used to rescale band values, apply a mask function, and create a composite from the resulting files in a single function call
On this point, GDAL warp can itself do these things, itself a monolith of abstractions itself but we can possibly avoid downloading entire scenes. I mention this not as a strong suggestion, mainly an invite to discuss further (I'm looking at the rise of {gdalraster} here).
It can be a good idea to tile your aoi using sf::st_make_grid()
I think this really needs an example, because there's room for guidance on creating a nice tile mosaic definition (with terra::align for example, or actually with st_tile or getTileExtents). It's very important point, and say what if we wanted a (CONUS) state-level imagery. This can be done a little more abstractly using terra::makeTiles and stars::st_tile (which give the index and extents helpfully but separately). (Maybe an assign-to-me task).
If you set the rsi_pc_key environment variable to your key (either primary or secondary; there is no difference), rsi will automatically use this key to sign all requests against Planetary Computer.
I only note this as a discussion-bounce, for possible followup: GDAL can do this too, and has file system abstrations that can copy, info, or warp-in-part to a target grid.
In the vignette I like when there's "one change", that's an excellent situation when you can tweak major changes with only one tiny plumbing modification.
This example (in the README) should save the file name as an output. It's otherwise a pipeline that I don't get an object for in the end, and it can take some time (in Australia).
projected_ashe <- sf::st_transform(ashe, 6542)
> get_landsat_imagery(
+ aoi = projected_ashe,
+ start_date = "2021-06-01",
+ end_date = "2021-06-30",
+ output_filename = tempfile(fileext = ".tif")
+ ) |>
+ terra::rast() |>
+ terra::plot()
I have some minor discomforts about when exactly warping or masking is done. I just want to talk about the details of this and might follow up, I'm a bit lost in the depths of the compositing function and use of sprc and mosaic, though.
A standalone question I had: can we use the aoi to drive the STAC query, but then download the files as-is without cropping/warping or compositing? I guess that would mean providing a link to the temp-file space being used.
Fin. Thanks so much for such a great package, and much apology for being so late here (especially appreciative of the patience of @mikemahoney218 and @jhollist).
Thank you so much, @mdsumner ! I'm excited to dive into your review.
As I said to Jeff over email, I also haven't been a paragon of timeliness here -- I got started replying to Felipe three weeks ago, and then immediately lost my focus. I'm hoping to carve out time to respond to both reviews starting on Wednesday of this week!
Thank you so, so much for your review here @OldLipe ! I feel like your comments have really improved the package. I'll walk through specific changes below, but first I wanted to ask about one remaining comment:
1- By default, the functions that download satellite images use random names for the images. I believe that using random names for satellite images could pose challenges for end users.
The issue I have with generating non-random names is that I feel users are in a better position than I am to know how to organize their files within a project. Users downloading multiple files are likely iterating through either collections, time ranges, or spatial areas of interest, and probably have a pre-existing idea of what distinguishes each file (and therefore would make for a good name). This is why, for instance, the documentation shows examples of providing your own output_filename
values when you know how your downloads are being "chunked".
So the idea behind random filenames is that it's something that is good enough for fast proof-of-concept "does this function work" tests, but is clearly not good enough for "real" usage. I'm hoping to force users to come up with their own file name conventions, rather than accepting the default options. This is also why I do handle the auto-naming when composite_function = NULL
, because I am very confident that I know what differentiates each file (the datetime) in that situation.
I'm curious what you think about this reasoning. If anything, I'd lean towards removing the default filenames altogether, but I do think they're useful for quick evaluation of the package.
1- [R]egarding the parameters
gdalwarp_options
andgdal_config_options
[...] Could these GDAL values be stored in a configuration file?
I moved these into functions: https://github.com/Permian-Global-Research/rsi/pull/77/commits/5adacb2bbe5cc8ab9cae57e7a57db585c7ab67d6
1- In the example of the
alos_palsar_mask_function()
, two warnings appeared. Listed below: [...] Is this behavior expected? Additionally, in this example, wouldn't it be more appropriate to save the image in a temporary directory? From a user's perspective, these warnings might scare new users.
I couldn't reproduce the warning (though I've seen it before, I just don't understand what triggers it) but I updated the documentation with a gdalwarp option to silence the warning: https://github.com/Permian-Global-Research/rsi/pull/77/commits/b1a15bab095e27cd242b45feb55704864dc96539
I also saved to a temporary file -- thanks! Because these examples don't run on CRAN, that one is a really easy thing to accidentally miss :sweat_smile: There were a few more of these here: https://github.com/Permian-Global-Research/rsi/pull/77/commits/fa7dc91c3d5eb86ac087d9cc62cf756ebc32b1ce
2- The example in the
calculate_indices()
function presents an error. I understand that you are demonstrating an example that does not work. Users sometimes enter the documentation, copy the code, execute it, and then test it. Instead of the error, perhaps providing more examples of how to use the function would be helpful.
I left the error in place, because I want to explain why this function intentionally doesn't let users use arbitrary functions. But I added more documentation around it: a comment right above the try()
to emphasize that the error is expected, and then another block below it to show how to work around this design: https://github.com/Permian-Global-Research/rsi/pull/77/commits/e2ec1378b50c0c756f84f502ec2298a9bc63a5c0
3- The functions that access catalogs [...] share the same documentation. [...] [I]nclude new examples for other types of data, such as radar, since the names of the assets vary for each collection.
I added additional examples to this document, using multiple data-retrieving functions and showing how to work with band mapping objects: https://github.com/Permian-Global-Research/rsi/pull/77/commits/7b82f3d115da15ef29d4d3415b1c462170b68212
* In the `sentinel2_mask_function()` function, I would like to understand why the value `2` (`shadows`) is kept as a non-masked value?
This was a straight-up mistake; thank you. Fixed here: https://github.com/Permian-Global-Research/rsi/pull/77/commits/f499c9507f8a06a318c42a904aa3538ccf779baf
* Regarding Landsat, I believe there could be improvements in filtering the values to be retained. Additionally, the implementation could be easier if working with bit values rather than integers.
Thank you for the examples here; without them, I absolutely never would have figured out how bitmasks work. I added a masked_bits
argument to the Landsat mask function, which can take vectors of bits to mask out. The include
argument now works by setting these bits: https://github.com/Permian-Global-Research/rsi/pull/77/commits/f499c9507f8a06a318c42a904aa3538ccf779baf
I find these filtering functions interesting, but I believe they may not be scalable. For instance, you provide two filtering functions:
filter_platforms()
andfilter_bands()
. What if a user wants to filter by cloud percentage within items? Or apply more complex filters to the properties of each item?
Discussed in https://github.com/ropensci/software-review/issues/636#issuecomment-2218735327
Thank you again -- this was a massively helpful review. Let me know if I missed anything (or made anything worse by mistake :smile:)
@mdsumner Thank you for your review and no worries on the timeline. We are very much appreciative (and understanding) of the time that all of our reviewers dedicate to rOpenSci! Keep an eye out for revisions and once those are in, use our review template (https://devguide.ropensci.org/approval2template.html) to indicate your approval of the revisions or if other changes are needed.
@OldLipe Thank you for your review as well! As you can see @mikemahoney218 has addressed your review. When you have a chance could you take a look at that and let us know if his revisions address your concerns or if you would like to see some additional changes. As mentioned above, please use the review template (https://devguide.ropensci.org/approval2template.html) for this.
For both of you, can you provide me with a rough estimate of hours spent on the review? This is something we keep track of.
Thank you all!
Thank you so much, Mike! This was an incredibly useful review process. I've responded to your specific comments below:
Please make mention somewhere that "where the compute runs" (i.e. what "local" means) is important here. There's an issue with performance in different regions - and some pointers to how one might run in a region closer to where the usual data sources are (us-west2 for example. I'm not suggesting that's an easy topic to cover, just would like to see it mentioned. It takes about 2x as long to run examples here in Tasmania, vs computers in us-west (I know that comparison is a lot more complex than this). It might be an idea to point to ways of running computers on public systems that run closer to the data, or at least a guide to that.
I added a small mention of this to the README and a larger mention to the Downloading vignette: https://github.com/Permian-Global-Research/rsi/pull/78/commits/55ad0d957bd54a4af9a4300803b4f412d6950684
I'd like see some validations of the raster values obtained in some examples, if there was an external example that's documented elsewhere (in a python notebook, or another R package) it would be neat to have a clear comparison of the scale/s shown be these raster files obtained here against an independent example. (I had intended to do this myself ...)
I'm curious if you have ideas on an efficient way to do this. I've tried to stub out tests to do this, but am running into issues that I wrote rsi to fix some of the rough edges I found when downloading data sets, which means I'm effectively re-implementing rsi's download functions in the test itself to try and square the circle. To give a more concrete example: I can download the assets of an item using rstac::assets_download()
, but that takes quite some time as it requires downloading the entire tile. I can work around that by using GDAL to do a partial download, but then I need to start writing the options for gdalwarp... which starts becoming just the code inside rsi itself again. I'm having a hard time thinking of a clever way to not need to download entire images but also not just copying package internals to confirm that they agree with the package itself.
I was looking for an example where I can make a true colour image with RGB, I don't understand the scaling that occurs in the examples (we have 0-1 scale numbers, which don't work with terra::plotRGB or its stretch argument off the shelf, and leave me unclear about the scaled ranges and whether I am plotting an RGB image correctly. The first example in the readme plots an RGB image as separate bands, and I'm unclear what to do to plot that as an "image". Again, I had meant to provide an actual example here. I believe the advice should be:
plotRGB(stretch(x)) ## note that this provides different defaults to stretch() itself plotRGB(x, stretch = "lin")
but also, it's open to interpretation and expert use afaics.
Added to the "How can I" vignette: https://github.com/Permian-Global-Research/rsi/pull/78/commits/528e31f7d62045471ac9233ae8934bfa4da7d445
And to the README and the get_stac_data()
examples: https://github.com/Permian-Global-Research/rsi/pull/78/commits/7c2c0cfd8bbddc5f1a519aa18eb829a2afdfdd4f
Specific notes
Two of these three files have no data in the aoi region.
qfiles <- get_landsat_imagery(aoi, start_date = "2022-06-01", end_date = "2022-06-30", composite_function = NULL)
I tried this naive thing, to run rsi_query_api, and I think at the least it should error fast when not getting a bbox in longlat.
it could detect that the bbox input is not in longlat and 1) error or 2) just transform it.
Fixed in https://github.com/Permian-Global-Research/rsi/pull/78/commits/d0acb46a87317cf643b630c926ca444caabeb3e8 . The documentation was also just wrong here; this function needed a bbox, not an sfc object. I changed things so either works. You can tell this function was pulled out from get_stac_data()
relatively late in the game :sweat_smile:
It's often useful to buffer your aoi object slightly, on the order of 1-2 cell widths, in order to ensure that data is downloaded for your entire AOI even after accounting for any reprojection needed to compare your AOI to the data on the STAC server.
Here I would rather reproject the extent, this is not so hard to do and exists in a few places raster::projectExtent, reproj::reproj_extent, and terra::project but possibly the best option is to use GDAL warp (via sf as elsewhere here) to reproject an empty raster. (Happy to follow up to illustrate).
I'm not sure I entirely follow you here! Would you be able to give an example?
This function can either download all data that intersects with your spatiotemporal AOI as multiple files (if composite_function = NULL), or can be used to rescale band values, apply a mask function, and create a composite from the resulting files in a single function call
On this point, GDAL warp can itself do these things, itself a monolith of abstractions itself but we can possibly avoid downloading entire scenes. I mention this not as a strong suggestion, mainly an invite to discuss further (I'm looking at the rise of {gdalraster} here).
Unfortunately I don't think gdalwarp can handle complicated compositing yet: https://github.com/OSGeo/gdal/issues/5176 And I haven't found any way to make it handle masking, either.
With regards to simple composites ("latest pixel wins" style), this is one of the messiest parts of get_stac_data()
, but we actually do use gdalwarp directly to handle those. Specifically, these lines check if we can get away with using the warper to stamp a bunch of images together:
The output of that gets passed as merge
to the download function:
Then this is where things get really silly: if merge == TRUE
, we only create a single output file for each asset:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L64-L76
And then we wind up calling this warp with multiple source
URLs and only the single output path, meaning we warp all the files while downloading:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L116-L123
This is actually a lot less complicated than it used to be -- there used to be an rsi_simple_download
and an rsi_complex_download
to handle the warpable/not-warpable downloads separately, which didn't share any code paths and as a result got quickly out of sync. Handling both via the same path makes things less readable, but means that all downloads flow down the same (heavily tested) pathway.
All that said, I documented this a bit more here: https://github.com/Permian-Global-Research/rsi/pull/78/commits/0b294d3f879e571e07508664a9c0426e7272d705
With regards to rescaling via the warper -- I'm definitely interested in this, but I've seen some rather complex rescaling formulas in the wild that aren't just a simple scale and offset, which has made me a bit spooked. I think there's still some dark magic you can do by writing a VRT with a complex transform equation, but I don't know that I understand VRTs well enough to maintain a package that did that, right now.
It can be a good idea to tile your aoi using sf::st_make_grid()
I think this really needs an example, because there's room for guidance on creating a nice tile mosaic definition (with terra::align for example, or actually with st_tile or getTileExtents). It's very important point, and say what if we wanted a (CONUS) state-level imagery. This can be done a little more abstractly using terra::makeTiles and stars::st_tile (which give the index and extents helpfully but separately). (Maybe an assign-to-me task).
I added an example of using st_make_grid()
here: https://github.com/Permian-Global-Research/rsi/pull/78/commits/f95714f51894d1f848ec051e92b1d8227eb04cfb
This example (in the README) should save the file name as an output. It's otherwise a pipeline that I don't get an object for in the end, and it can take some time (in Australia).
I have some minor discomforts about when exactly warping or masking is done. I just want to talk about the details of this and might follow up, I'm a bit lost in the depths of the compositing function and use of sprc and mosaic, though.
Yeah, it's an easy function to get lost in. You can see my own notes here from the last time I was refactoring:
The steps are usually querying, filtering down the returned results, (warping and) downloading the relevant items, masking them, compositing the outputs, and then rescaling.
The warping is controlled by the gdalwarp_options
object, primarily. The first "live" code (not just parameter checking) is this bit here, which processes those options:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L295-L300
That processing function just sets the t_srs
and tr
options, if they weren't provided, to handle the actual warp:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L781-L798
Which then eventually gets passed to the actual download call: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L116-L123
So that's warping and downloading handled: each asset is (usually) warped and downloaded separately.
Each asset then gets masked independently: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L388-L394
These are masked independently mostly to make the implementation easier, because now each asset is composited into a single file per asset: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L399-L409
I didn't want to try and track if all assets existed in all items, or so on, and so we don't aggregate assets into items until after rescaling.
As for compositing: there's three pathways here. The first one I discussed above: if files didn't need to get masked or rescaled, we composited them during the download stage and they skip the composite process entirely.
The second one is if users specified "merge"
but also wanted a mask or rescaling, in which case we're basically just calling terra::merge()
:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L690-L697
The third is for all the other functions, which are applied using terra::mosaic()
:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L699-L707
As far as I'm concerned, sprc()
is just a container that can handle one-or-more possibly overlapping rasters (compared to rast()
, which can't). We're using that to hold the unknown number of possibly-overlapping files associated with a single asset, then merging them using some terra function.
Hope this all made sense!
A standalone question I had: can we use the aoi to drive the STAC query, but then download the files as-is without cropping/warping or compositing? I guess that would mean providing a link to the temp-file space being used.
I added a section to the "How Can I" vignette about one version of this -- downloading each item separately: https://github.com/Permian-Global-Research/rsi/pull/78/commits/528e31f7d62045471ac9233ae8934bfa4da7d445
(To skip masking, you'd also set mask_function = NULL
)
If you want to get each asset separately, that's probably where rstac::assets_download()
becomes more useful -- or building a query with rstac and using GDAL to grab items yourself. I think at that point, you no longer want the additional abstraction rsi gives you, and it makes sense to go one level "lower" down the stack.
Ok amazing, love these responses and the changes you've made. I think you'll need to link to this issue in the Details section, because it's a really great section on the concerns and the journey you've been on. (I'm getting more enmeshed in the python side via odc and so each time I come to rsi I have more perspective and learn a lot more).
There's no showstoppers now from my perspective, I think you've responded to this review brilliantly and I'm stoked with all the updates you've made, and the explanations. Please consider my take as 'approved'. @jhollist
I'm curious if you have ideas on an efficient way to do this.
I actually didn't mean automated testing validation, just like a real world example where we can get confirmation of the values we see in a small context. I will follow up when I can but I don't consider this a blocker or anything.
Also I need to follow up here (which I can't remember exactly now, I may have been thinking about a different part of the help content). When I explore again I will bring this up, outside this review as an issue/discussion piece.
I'm not sure I entirely follow you here! Would you be able to give an example?
Thanks!!
Looks like we are really close on this one!
@OldLipe do you feel like to @mikemahoney218 has addressed the issues raised in your review?
@mdsumner Thank you for the follow up and the approval! How many hours do you think you spent on the review?
You both can use this template for your response: https://devguide.ropensci.org/approval2template.html or you can just provide that directly as well!
@mdsumner and @OldLipe Just trying to finalize this. See https://github.com/ropensci/software-review/issues/636#issuecomment-2358634766
7 hours was my tally 🙏
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/636#issuecomment-2283110229 time 7
Date accepted: 2024-10-01 Submitting Author Name: Mike Mahoney Submitting Author Github Handle: !--author1-->@mikemahoney218<!--end-author1-- Repository: https://github.com/Permian-Global-Research/rsi/ Version submitted: Submission type: Standard Editor: !--editor-->@jhollist<!--end-editor-- Reviewers: @mdsumner, @OldLipe
Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
This package supports (spatial) data retrieval from APIs implementing the OGC STAC API standard, processing the downloaded data (including automated masking, compositing and rescaling) computing spectral indices from those data, and wrangling the outputs into formats useful for modeling and visualization.
Anyone with a need to download and process spatial data, particularly remote sensing data, particularly satellite-based earth observation rasters. We've used rsi to automate the entire data preparation process of forest carbon and structure models (not yet published), but the package is broadly useful to anyone working in Earth surface modeling.
Yes:
There are a few minor things that I think rsi does better than other approaches (we sign items right before each one is downloaded, for instance, whereas some other packages sign items before starting to download the entire set, meaning the signature can expire causing large downloads to fail), but this difference I think is mostly a matter of taste. I'm very familiar with local GDAL, terra, and sf, and so rsi tries to get users back to working with local GDAL, terra, and sf as fast as possible.
NA
NA
pkgcheck
items which your package is unable to pass.~I have never successfully gotten the CI item to be a check, and I have no idea why. I'm using the standard
usethis
functions to structure my CI, for what it's worth!~ Apparently this is a local-only issue.I addressed failing covr in https://github.com/ropensci/software-review/issues/636#issuecomment-2028086514
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct