ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

presubmission inquiry - prismaread #398

Closed lbusett closed 4 years ago

lbusett commented 4 years ago

Submitting Author: Lorenzo Busetto (@lbusett )
Repository:
https://github.com/lbusett/prismaread/

Package: prismaread
Title: Import PRISMA L1/L2 hyperspectral data and convert them to 
 a more user friendly format
Version: 1.0.0
Authors@R: 
    c(person("Lorenzo", "Busetto", email = "lbusett@gmail.com", role = c("aut", "cre"),
    comment = c(ORCID = '0000-0001-9634-6038')), 
    person("Luigi", "Ranghetti", email = "ranghetti.l@irea.cnr.it", role = c("aut"), 
    comment = c(ORCID = '0000-0001-6207-5188')))
Description: `prismaread` allows easily importing PRISMA (http://www.prisma-i.it/index.php/it/) satellite
 hyperspectral data cubes and ancillary information from the original data provided by ASI in HDF format, 
and convert them to a easier to use format (ENVI or GeoTiff). It also provides functionality for automatically 
computing Spectral Indexes from either the original HDF data or from hyperspectral data already converted
using function `pr_convert`, and for easily and quickly extracting  data and computing statistics for the 
different bands over areas of interest.
License: GPL-3
Encoding: UTF-8
LazyData: true
URL: https://lbusett.github.io/prismaread/
BugReports: https://github.com/lbusett/prismaread/issues
Imports: 
    hdf5r,
    tools,
    raster,
    exactextractr,
    data.table,
    dplyr,
    sf,
    tidyr,
    tidyselect,
    stringr,
    openxlsx,
    magrittr, 
    DT,
    rlang,
    assertthat
Suggests: testthat, 
    spelling,
    knitr,
    rmarkdown, 
    ggplot2,
    tibble,
    piggyback,
    usethis, 
    mapview,
    piggyback
Remotes: 
    isciences/exactextractr
RoxygenNote: 7.1.1
VignetteBuilder: knitr

Scope

data munging because prismaread allows easily importing PRISMA data from the rather complex original HDF format and save them to easier to use raster formats. It also allows to extract ancillary datasets stored alongside the hyperspectral information (e.g., cloud and land cover masks; acquisition angles, latlon matrixes) and save them as rasters aligned with the hyperspectral data. Finally, it provides functionality for automatically computing Spectral Indexes frome the spectral cubes, and for accessing and storing spectral information over areas of interest.

geospatial because prismaread focuses on accessing and manipulating geospatial hyperspectral data, ans converting them to easier to use geospatial data formats. The data made available by prismaread functionality can be used for geospatial analysis of different kinds.

The target audience is mainly made of Remote Sensing data practitioners interested in testing the capabilities of the innovative PRISMA hyperspectral sensor for analysing the characteristics of the earth surface. Scientific applications of PRISMA data are those typical of satellite hyperspectral imagery, including for example land cover/use characterization, extraction of biophysical parameters from spectral data, vegetation status monitoring, characterization of coastal waters, etc.

No other packages for performing prismaread functionality are currently available to our knowledge. PRISMA data importers are available now in commercial software (e.g., ENVI, ERMapper), but no other OSS solution is currently available to our knowledge.

Yes

1) The testing suite of the package requires accessing rather large original PRISMA datasets. This is currently implemented exploiting package piggyback functionality to upload and download/cache test files from the prismaread GITHUB repository. This works well but requires a rather long time for downloading data when tests are executed for the first time (test coverage is alreaydy above 90%);

2) No API for directly downloading PRISMA data is currently availble. Users has to ask access to data and register themselves for data access starting at http://prisma-i.it/index.php/en/, and then download data of interest as zip files before being able to exploit prismaread functionality.

geanders commented 4 years ago

@lbusett : Thanks so much for your presubmission! We will be discussing if it would be in-scope. As a first question to help our discussion, it sounds like some of the tasks listed for "data munging" are perhaps closer to "data extraction", in terms of parsing scientific datasets. There is some more guidance in the rOpenSci book on these categories of scope, and in particular this description from "data extraction" could be helpful:

"Packages that aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment."

Could you clarify a bit any separation between data extraction and data munging tasks in the package?

Also, in terms of comparing with other available software, could you talk about whether there are other more general packages for importing/extracting data from a HDF format, and how your package provides different or improved functionality in terms of working with the PRISMA data compared to using a more general R package alternative for HDF data?

lbusett commented 4 years ago

@geanders

thanks for your prompt reply! I'll answer inline.

@lbusett : Thanks so much for your presubmission! We will be discussing if it would be in-scope. As a first question to help our discussion, it sounds like some of the tasks listed for "data munging" are perhaps closer to "data extraction", in terms of parsing scientific datasets. There is some more guidance in the rOpenSci book on these categories of scope, and in particular this description from "data extraction" could be helpful:

"Packages that aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment."

Could you clarify a bit any separation between data extraction and data munging tasks in the package?

Yes, after reading the description more carefully I'd guess that the Data Extraction category could fit better, since prismaread main objective is to parse/retrieve the data coming from a scientific equipment (the PRISMA hyperspectral sensor) to make it more easily accessible to users. I was probably a bit mislead by the "tools for handling data in specific scientific formats" description of the Data Munging category.

Concerning the separation between "data munging" and "data extraction": The main prismaread objective is to access the original PRISMA data, retrieve the different available information and make it available to users as standard and more easily accessible raster datasets (See also the next reply). So, this probably would fit more in the "Data Extraction" categorization. What I termed as "data extraction" functionality in my previous post refers to a function providing a dedicated wrapper to the exactextract function of package exactextracter (See https://lbusett.github.io/prismaread/articles/Extracting-data-over-vector.html) that can be used to extract original data / statistics over area of interest starting from an already preprocessed PRISMA spectral cube, providing functionality to retrieve it in either long/wide formats and save the retrieved data as RData, excel or CSV files.

Also, in terms of comparing with other available software, could you talk about whether there are other more general packages for importing/extracting data from a HDF format, and how your package provides different or improved functionality in terms of working with the PRISMA data compared to using a more general R package alternative for HDF data?

Several "R" packages allow general access to HDF data, the "main" ones being hdf5r, raster, stars. However, HDF5 files can be considered as general use containers that can store a variety of types of data and metadata without any strict "rules" about how data is organized within them (see https://support.hdfgroup.org/HDF5/whatishdf5.html - HDF5 is very versatile, but sometimes at the expense of ease of use/access). Indeed, prismaread exploits functionality of some of these more general use packages (mainly, hdf5r to access the HDF datasets, and raster to reshape / export them) to access the rather complex specific structure of PRISMA raw hdf files. PRISMA data contains infact several subdatasets (e.g., spectral data, ancillary data such as cloud masks or info about acquisition angles ,...), and those subdatasets vary according to the Processing Level of PRISMA data ( L2D = geocoded reflectances, L2C non geocoded reflectances, L2B surface radiance, L1 at sensor radiance ). It also contains metadata attributes that need to be accessed and used, for example, to geocode the images or apply scaling factors to the data. This makes properly importing the available geospatial data a non-trivial task.

In this context prismaread makes it painless to the user to extract the datasets of interest and convert them to user friendly formats (See https://lbusett.github.io/prismaread/articles/Importing-Level-1-Data.html; https://lbusett.github.io/prismaread/articles/Importing-Level-2-Data.html). This requires performing tasks such as matrix transpositions to transform the data cubes to a proper "x,y,wavelength" order, bow/tie geocoding to associate coordinates to L1/2B/2C data, parsing metadata attributes to retrieve wavelength information and scaling factors required to convert raw data to physical units, and "parallel" processing of L1 and L2 data to be able to associate geocoding and acquisition angles information also to L1 data.

With respect to the "general" packages providing access to HDF data, prismaread is therefore specifically dedicated to parsing the format of PRISMA data and performing all required operations needed to extract information of interest.

I hope I have sufficiently replied to your questions, but can elaborate more if needed!

regards,

Lorenzo

lbusett commented 4 years ago

@geanders

Hi, sorry for bothering you: I was just wondering if there are any news on this presubmission enquiry.

regards,

Lorenzo

melvidoni commented 4 years ago

Hello @lbusett. Apologies for the delay. The EIC role is rotating again, and I'm taking over this inquiry. I'll address this with the editors and let you know, hopefully during the week.

melvidoni commented 4 years ago

Hello @lbusett. This package is straightforwardly in-scope under data munging and data extraction. We welcome a full submission when it is ready.

stefaniebutland commented 3 years ago

Very sad news. I just learned that Lorenzo Busetto passed away. Beautiful tribute and dedication in the MODIStsp package he co-developed with Luigi Ranghetti: https://docs.ropensci.org/MODIStsp/articles/lorenzo.html

ranghetti commented 3 years ago

Thank youi @stefaniebutland to have informed me about this presubmission. I will be glad to carry on the submission, but I will need some time to explore the code and become familiar with it. I will open a full submission in the next months.