Closed karpfen closed 5 months ago
Great proposal and I am absolutely supporting this! One thing to keep in mind is that if we implemented #232 there is actually no longer a technical need for the register functions, since the arguments as well as the resource/indicator functions are provided to the backend during user run-time. I think this actually benefits your proposal, because we could still keep the register functionality, however, now with the single purpose to produce a nice overview for users, e.g. by default returning a tibble with yet to be determined variables etc. Also, experienced users aiming at extending the backend with their own resources/indicators no longer are required to use the register functions if they have no need for it.
hi @karpfen, wanted to revive this as I am currently re-working the register functionality as mentioned in my last comment. Would appreciate your input on what you think would be valuable information to convey to users of the package via available_*()
. I would require your feedback before the end of the month, as this is the deadline to announce the dev branch available for testing.
Hi @goergen95, in general, I have relatively simple tables in mind that don't exhaustively contain all information on all indicators but rather give a more compact overview about what's possible with the package.
Right now, the available_*
functions contain all the information, but you kind of have to know you have to use something like names(available_indicators())
and then continue with available_indicators()$indicatorXYZ
from there.
I think a solution that lists the resources/indicators in just a single table and the user can then dig deeper with ?indicatorXYZ
would make these resources more accessible discoverable in the end.
For how to achieve this, I see two options.
register_*
calls.
available_resources()
generates a table listing the @name
,@format
,@reference
,)@source
)of all documentation with @docType == 'data'
and @keywords == 'resource'
.
available_indicators()
generates a table listing the
@name
of the related resource if somehow possible@name
of the indicatorof all documentation with@docType == 'data'
and @keywords == 'indicator'
.
register_indicator()
: Here I'm basically imagining the approach from the initial post. It just lists the contents that have been passed to the register_*
functions in a table rather than a list.
I'm imagining that this is technically easier, although it would keep the burden of filling the register_*
functions on the developer.What do you think, does that make sense?
Thanks, a version is now up on dev
at revision https://github.com/mapme-initiative/mapme.biodiversity/commit/69008e3e80995f8a1b017f0f2b180e8a9c879821. I opted for the non-automated way for now. available_resources()
returns a flat tibble with columns name, type, source, and license. available_indicators()
now returns a tibble with columns name, description, and resources, where resources is a nested-list column with the required resources as outputted from available_resources()
.
> available_resources()
# A tibble: 23 × 4
name type source licence
<chr> <chr> <chr> <chr>
1 chirps raster https://www.chc.ucsb.edu/data/chirps CC - u…
2 esalandcover raster https://registry.opendata.aws/esa-worldcover-vito/ CC-BY …
3 fritz_et_al raster https://zenodo.org/record/7997885/ CC-BY …
4 gfw_emissions raster https://data.globalforestwatch.org/datasets/gfw::forest-greenhouse-gas-emission… CC-BY …
5 gfw_lossyear raster https://data.globalforestwatch.org/documents/tree-cover-loss/explore CC-BY …
6 gfw_treecover raster https://data.globalforestwatch.org/documents/tree-cover-2000/explore CC-BY …
7 global_surface_water_change raster https://global-surface-water.appspot.com/download https:…
8 global_surface_water_occurrence raster https://global-surface-water.appspot.com/download https:…
9 global_surface_water_recurrence raster https://global-surface-water.appspot.com/download https:…
10 global_surface_water_seasonality raster https://global-surface-water.appspot.com/download https:…
# ℹ 13 more rows
# ℹ Use `print(n = ...)` to see more rows
> available_indicators()
# A tibble: 26 × 3
name description resources
<chr> <chr> <list>
1 active_fire_counts Number of detected fires by NASA FIRMS <tibble [1 × 4]>
2 active_fire_properties Extraction of properties of fires detected by NASA FIRMS <tibble [1 × 4]>
3 biome Areal statistics of biomes from TEOW <tibble [1 × 4]>
4 deforestation_drivers Areal statistics of deforestation drivers <tibble [1 × 4]>
5 drought_indicator Relative wetness statistics based on NASA GRACE <tibble [1 × 4]>
6 ecoregion Areal statstics of ecoregions based on TEOW <tibble [1 × 4]>
7 elevation Statistics of elevation based on NASA SRTM <tibble [1 × 4]>
8 fatalities Number of fatalities by group of conflict based on UCDP GED <tibble [1 × 4]>
9 gsw_change Statistics of the surface water change layer by JRC <tibble [1 × 4]>
10 gsw_occurrence Areal statistic of surface water based on occurrence threshold <tibble [1 × 4]>
# ℹ 16 more rows
# ℹ Use `print(n = ...)` to see more rows
> available_indicators("treecover_area")[["resources"]]
[[1]]
# A tibble: 2 × 4
name type source licence
<chr> <chr> <chr> <chr>
1 gfw_lossyear raster https://data.globalforestwatch.org/documents/tree-cover-loss/explore CC-BY 4.0
2 gfw_treecover raster https://data.globalforestwatch.org/documents/tree-cover-2000/explore CC-BY 4.0
@karpfen: another question - is it within the scope of this issus to rename resources/indicators?
I'm all for it. I think this is a good opportunity to make the naming a bit more stringent.
Agreed, I also would like to reconsider how we name resources/indicators. I will post a proposal in this issue later this week and we can give others some time to respond. I could imagine to pull up a first draft towards end of April and merge it into main with the second milestone towards end of May (see #240).
Just reconsidered: I think its best to limit the scope of this issue to the already implemented re-work of the available_*() functions and open a new issue for renaming resources/indicators.
after discussion with @Jo-Schie this could be further improved for discoverability for new users:
@Jo-Schie and @karpfen: please see the GitHub rendered version of the README and leave some feedback?
Cool, this is much more legible than before.
One point regarding the code example: The code is wrapped in parentheses (starting at https://github.com/mapme-initiative/mapme.biodiversity/blob/dev/README.Rmd#L111), this only saves us a print()
statement, right?
I'd actually remove them, just because it may look unfamiliar to a lot of people.
Maybe the printing could even be done in a separate chunk with echo=FALSE
, then it looks even cleaner. What do you think, @goergen95 ?
Released with version v0.6.0
Background
With increasing numbers of resources and indicators it becomes harder for users to keep an overview of available indicators.
In light of #232, it would also make sense to align resource and indicator names. I think in general it is nice to stick to the convention of using a source name (e.g. the data providing organization) followed by the indicator (e.g.
gfw_treecover
) wherever possible.One thing to consider, though: users may not know what the acronyms gfw, gsw, gmw, ... stand for and need a convenient way to figure it out. Which is why I suggest we provide a...
Clear, readable table of available resources and indicators
available_indicators()
andavailable_resources()
do their job, but I'd like to have something more legible in a nice tabular form. I'm imagining something similar to this:Maybe with one additional column that's a bit more descriptive than just the abbreviated indicator names.
To dos
Based on the points above, I suggest these action points. Anything is up for discussion of course.