mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/dev
GNU General Public License v3.0
33 stars 7 forks source link

Re-work output of available resources and indicators to be more legible #236

Closed karpfen closed 5 months ago

karpfen commented 8 months ago

Background

With increasing numbers of resources and indicators it becomes harder for users to keep an overview of available indicators.

In light of #232, it would also make sense to align resource and indicator names. I think in general it is nice to stick to the convention of using a source name (e.g. the data providing organization) followed by the indicator (e.g. gfw_treecover) wherever possible.

One thing to consider, though: users may not know what the acronyms gfw, gsw, gmw, ... stand for and need a convenient way to figure it out. Which is why I suggest we provide a...

Clear, readable table of available resources and indicators

available_indicators() and available_resources() do their job, but I'd like to have something more legible in a nice tabular form. I'm imagining something similar to this:

library(mapme.biodiversity)

ind <- available_indicators()

res <- as.character(sapply(ind, function(x) {paste0(names(x$resources), collapse = ", ")}))
pm <- as.character(sapply(ind, function(x) {paste0(names(x$processing_mode), collapse = ", ")}))
arg <- as.character(sapply(ind, function(x) {paste0(names(x$arguments), collapse = ", ")}))

tibble(
  names(ind), res, pm, arg
)

Maybe with one additional column that's a bit more descriptive than just the abbreviated indicator names.

To dos

Based on the points above, I suggest these action points. Anything is up for discussion of course.

goergen95 commented 8 months ago

Great proposal and I am absolutely supporting this! One thing to keep in mind is that if we implemented #232 there is actually no longer a technical need for the register functions, since the arguments as well as the resource/indicator functions are provided to the backend during user run-time. I think this actually benefits your proposal, because we could still keep the register functionality, however, now with the single purpose to produce a nice overview for users, e.g. by default returning a tibble with yet to be determined variables etc. Also, experienced users aiming at extending the backend with their own resources/indicators no longer are required to use the register functions if they have no need for it.

goergen95 commented 6 months ago

hi @karpfen, wanted to revive this as I am currently re-working the register functionality as mentioned in my last comment. Would appreciate your input on what you think would be valuable information to convey to users of the package via available_*(). I would require your feedback before the end of the month, as this is the deadline to announce the dev branch available for testing.

karpfen commented 6 months ago

Hi @goergen95, in general, I have relatively simple tables in mind that don't exhaustively contain all information on all indicators but rather give a more compact overview about what's possible with the package. Right now, the available_* functions contain all the information, but you kind of have to know you have to use something like names(available_indicators()) and then continue with available_indicators()$indicatorXYZ from there.

I think a solution that lists the resources/indicators in just a single table and the user can then dig deeper with ?indicatorXYZ would make these resources more accessible discoverable in the end.

For how to achieve this, I see two options.

  1. "Auto generated": This would eliminate the need for the register_* calls.
    • available_resources() generates a table listing the
    • Title text of the reference document (if that is possible)
    • @name,
    • @format,
    • (@reference,)
    • (@source)

of all documentation with @docType == 'data' and @keywords == 'resource'.

of all documentation with@docType == 'data' and @keywords == 'indicator'.

  1. Using information from register_indicator(): Here I'm basically imagining the approach from the initial post. It just lists the contents that have been passed to the register_* functions in a table rather than a list. I'm imagining that this is technically easier, although it would keep the burden of filling the register_* functions on the developer.

What do you think, does that make sense?

goergen95 commented 6 months ago

Thanks, a version is now up on dev at revision https://github.com/mapme-initiative/mapme.biodiversity/commit/69008e3e80995f8a1b017f0f2b180e8a9c879821. I opted for the non-automated way for now. available_resources() returns a flat tibble with columns name, type, source, and license. available_indicators() now returns a tibble with columns name, description, and resources, where resources is a nested-list column with the required resources as outputted from available_resources().

> available_resources()
# A tibble: 23 × 4
   name                             type   source                                                                           licence
   <chr>                            <chr>  <chr>                                                                            <chr>  
 1 chirps                           raster https://www.chc.ucsb.edu/data/chirps                                             CC - u…
 2 esalandcover                     raster https://registry.opendata.aws/esa-worldcover-vito/                               CC-BY …
 3 fritz_et_al                      raster https://zenodo.org/record/7997885/                                               CC-BY …
 4 gfw_emissions                    raster https://data.globalforestwatch.org/datasets/gfw::forest-greenhouse-gas-emission… CC-BY …
 5 gfw_lossyear                     raster https://data.globalforestwatch.org/documents/tree-cover-loss/explore             CC-BY …
 6 gfw_treecover                    raster https://data.globalforestwatch.org/documents/tree-cover-2000/explore             CC-BY …
 7 global_surface_water_change      raster https://global-surface-water.appspot.com/download                                https:…
 8 global_surface_water_occurrence  raster https://global-surface-water.appspot.com/download                                https:…
 9 global_surface_water_recurrence  raster https://global-surface-water.appspot.com/download                                https:…
10 global_surface_water_seasonality raster https://global-surface-water.appspot.com/download                                https:…
# ℹ 13 more rows
# ℹ Use `print(n = ...)` to see more rows
> available_indicators()
# A tibble: 26 × 3
   name                   description                                                    resources       
   <chr>                  <chr>                                                          <list>          
 1 active_fire_counts     Number of detected fires by NASA FIRMS                         <tibble [1 × 4]>
 2 active_fire_properties Extraction of properties of fires detected by NASA FIRMS       <tibble [1 × 4]>
 3 biome                  Areal statistics of biomes from TEOW                           <tibble [1 × 4]>
 4 deforestation_drivers  Areal statistics of deforestation drivers                      <tibble [1 × 4]>
 5 drought_indicator      Relative wetness statistics based on NASA GRACE                <tibble [1 × 4]>
 6 ecoregion              Areal statstics of ecoregions based on TEOW                    <tibble [1 × 4]>
 7 elevation              Statistics of elevation based on NASA SRTM                     <tibble [1 × 4]>
 8 fatalities             Number of fatalities by group of conflict based on UCDP GED    <tibble [1 × 4]>
 9 gsw_change             Statistics of the surface water change layer by JRC            <tibble [1 × 4]>
10 gsw_occurrence         Areal statistic of surface water based on occurrence threshold <tibble [1 × 4]>
# ℹ 16 more rows
# ℹ Use `print(n = ...)` to see more rows
> available_indicators("treecover_area")[["resources"]]
[[1]]
# A tibble: 2 × 4
  name          type   source                                                               licence  
  <chr>         <chr>  <chr>                                                                <chr>    
1 gfw_lossyear  raster https://data.globalforestwatch.org/documents/tree-cover-loss/explore CC-BY 4.0
2 gfw_treecover raster https://data.globalforestwatch.org/documents/tree-cover-2000/explore CC-BY 4.0
goergen95 commented 6 months ago

@karpfen: another question - is it within the scope of this issus to rename resources/indicators?

karpfen commented 6 months ago

I'm all for it. I think this is a good opportunity to make the naming a bit more stringent.

goergen95 commented 6 months ago

Agreed, I also would like to reconsider how we name resources/indicators. I will post a proposal in this issue later this week and we can give others some time to respond. I could imagine to pull up a first draft towards end of April and merge it into main with the second milestone towards end of May (see #240).

goergen95 commented 6 months ago

Just reconsidered: I think its best to limit the scope of this issue to the already implemented re-work of the available_*() functions and open a new issue for renaming resources/indicators.

goergen95 commented 6 months ago

after discussion with @Jo-Schie this could be further improved for discoverability for new users:

goergen95 commented 6 months ago

@Jo-Schie and @karpfen: please see the GitHub rendered version of the README and leave some feedback?

karpfen commented 6 months ago

Cool, this is much more legible than before. One point regarding the code example: The code is wrapped in parentheses (starting at https://github.com/mapme-initiative/mapme.biodiversity/blob/dev/README.Rmd#L111), this only saves us a print() statement, right? I'd actually remove them, just because it may look unfamiliar to a lot of people.

Maybe the printing could even be done in a separate chunk with echo=FALSE, then it looks even cleaner. What do you think, @goergen95 ?

goergen95 commented 5 months ago

Released with version v0.6.0