mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/
GNU General Public License v3.0
24 stars 7 forks source link

:loudspeaker: Announcement: Upcoming changes for `{mapme.biodiversity}` #240

Open goergen95 opened 4 months ago

goergen95 commented 4 months ago

Dear MAPME friends,

today we are happy to make some announcements about the future direction of developments for {mapme.biodiversity}. Before diving into the details let me thank you for choosing this package and for the various contributions we were lucky to receive from you during the last years! :tada:

In order to adapt to new challenging and diverse use cases and in the hope to serve an even wider community, some major changes to the packages are in order. This issue is about informing you about the upcoming changes as we walk on our path towards a 1.0 release. It gives a detailed overview of what changes you can expect in the next several months and exactly when to expect them. We encourage you to adopt to these changes early on, and ask you to share your feedback with us along the way.

The overall vision of the proposed changes is to provide you with a package that is versatile in its application environments, delivering clearly structured outputs that can be serialized to different data formats with ease, while providing a clear and idiomatic interface.

[!Important] This issue will be used to inform you about the progress as we will post updates here once we reach each milestone. Note, that discussions about details of the proposed changes or specific implementation details are best discussed in separate issues.

To achieve our vision from the current state will mean that we introduce some breaking changes along the way. It is our wish to inform you ahead of time, so that you can plan accordingly.

Schedule of the milestones

We ordered the milestones with the most severe changes happening upfront. That way, once you adopted your workflows to a milestone the adaption to the next should be less severe.

Milestone Merge on Main Release to CRAN
User-Interface End of March End of April
Standardized Output End of April End of May
GDAL Backend End of May End of June

: Schedule for the proposed milestones.

[!Important] Please note that this is a preliminary schedule. While we will not shorten the overall time-frame, but it might take us longer than expected to implement the indicated changes.

Milestone 1: Cleaner Interface Using Closures

This milestone sets out to provide you with a cleaner interface that provides instant feedback if arguments are wrongly specified. For this, we are going to use closures, e.g. functions that return other functions. The arguments that are important for you as a user to control the functionality are exposed at the outer level. This will make a call to fetch some resources and the subsequent calculation of indicators look something like this:

aoi <- get_resources(aoi, get_nasa_srtm(), get_gmw(years = 2010)) 
aoi <- calc_indicators(aoi, calc_elevation(stat = "mean"), calc_mangroves_area())

This also means that it will be easier to access the help pages for a resource/indicator, because these will be associated with fu nction names you are actually using, e.g.:

?get_nasa_srtm
?calc_elevation

Arguments will instantly be checked for correctness and inform you about any miss-specifications. This interface will also make it easier to add custom resources/indicators add-hoc, for those that require this functionality.

Milestone 2: Standardized Indicator Output and Serialization Options

With this milestone we will have revised all indicator functions to return a standardized output format. The envisaged output format is inspired by the MovingFeatures standard and differentiates between simple and temporal properties. Simple properties are 1-dimensional attribute values of features that we are most familiar with from various GIS software. However, some of our indicators actually have a temporal axis. We will harness the fact that we already use nested list columns to represent our indicator data in R. However, we will standardize the output of all temporal indicators to a common format, e.g. along the lines of the following example output:

## # A tibble: 76 × 6
##       datetimes            variable  unit   value
##       <chr>                <chr>     <chr>  <dbl>
##  1    2000-01-01T00:00:00Z treecover ha    12089.
##  2    2001-01-01T00:00:00Z treecover ha    12075.
##  3    2002-01-01T00:00:00Z treecover ha    12053.
##  4    2003-01-01T00:00:00Z treecover ha    11978.
##  5    2004-01-01T00:00:00Z treecover ha    11926.
##  6    2005-01-01T00:00:00Z treecover ha    11877.
##  7    2006-01-01T00:00:00Z treecover ha    11851.
##  8    2007-01-01T00:00:00Z treecover ha    11800.
##  9    2008-01-01T00:00:00Z treecover ha    11780.
## 10    2009-01-01T00:00:00Z treecover ha    11758.

This way we allow for a much better predictability of the indicator output format for downstream applications. Also, it allows us to supply you with seamless serialization functions for GeoPackage, GeoJSON, and even MovingFeatures JSON.

We will also take the opportunity to revise and optimize the indicator functions for better overall performance. Additionally, we will add support for multi-polygon geometries by supplying a mechanism for selecting aggregation functions for indicators.

Milestone 3: Routing data I/O through GDAL

In its current state the package downloads data to the local file system. This severely limits the environments in which the package can be used efficiently. We received several requests to support e.g. different types of cloud storage. Manually maintaining drivers to read/write to commercial cloud storage is not something we are able to do. However, the good news is that GDAL already supports major cloud storage providers via its Virtual Filesystem drivers. With this milestone, we will leverage GDAL's capabilities to read/write geodata from a huge range of formats and sources. This will allow users of mapme.biodiversity to run their applications on a cloud provider of their choice. We will supply configuration options that will ease the process to authenticate e.g. against a cloud storage attached to a compute in the cloud. Thus, the interface to pull data from the internet to an S3 cloud storage will look something like this:

mapme_options(
  outdir = "/vsis3/my-s3-bucket",
  gdal_opts = gdal_s3_opts(
   AWS_ACCESS_KEY_ID = "my-aws-key",
   AWS_S3_ENDPOINT = "https://my-bucket.af-south-1.amazonaws.com"
  )
)

get_resources(aoi, get_nasa_srtm())

Leveraging GDAL for data transfer will also allow us to translate data to cloud-optimized formats that should increase computation speeds further down the line.

In case a specific resource is already provided in a cloud-optimized data format on a low-latency server, you might also decide to skip the download step altogether. This will most likley be only efficient for small to medium size portfolios and since we do not control how resources are provided, there will be limitations which resources support such a workflow.

Development of new resources/indicators

[!Important] We schedule this transition to be a process spanning approximately four months. During this period, we will not include new resource/indicators in {mapme.biodiversity}, but reduce our activity on this site to bug-fixing of existing resources/indicators.

However, in the meantime, we will work on new resources/indicators at {mapme.indicators}.

After all milestones have been achieved, we will decide if a separation of the backend from concrete resource/indicators implementation makes sense for the future. In the case the answer is yes, all resources/indicators will eventually be migrated to {mapme.indicators}. In the case the answer is no, we will migrate the newly developed indicators into {mapme.biodiversity}. As always, you are invited to share your feedback along the way.

fBedecarrats commented 4 months ago

This is great news, congratulations! Two questions:

Thanks in advance for the feedback!

goergen95 commented 4 months ago

Thanks for your feedback and questions!

what about discussions regarding possible further splitting between mapme.* packages?

As indicated, that is not a settled issue yet and we are happy to receive your feedback. However, if we were to split the package the casual user would call library(mapme.indicators) and since that package depends on the backend package that would be all you would need to change to your workflows (though we would most probably need to get it published on CRAN as well). The backend package itself would then only be of interest to more involved contributors.

How would you recommend to handle the transition?

I cannot give specific recommendations on this, as the decision how to best handle this will depend on your context. You will be able to install all prior published versions from CRAN during the process (so you could settle to conduct your most urgent work with version 0.5 via remotes::install_version("mapme.biodiversity", version = "0.5")). As you can see we also plan to send updates to CRAN when reaching each specific milestone. To fully benefit from the upcoming changes, I however recommend to adapt to the new interface as early as possible.

goergen95 commented 3 months ago

Today, we are happy to anounce that the development branch including the latest development of our first milestone Cleaner Interface Using Closures is ready for testing! :tada:

Please revise NEWS.md for a quick overview of the proposed changes.

You can install the package from the main branch via:

remotes::install_github("mapme-initiative/mapme.biodiversity")

We also provide a ready-to-use docker image that is re-build every day with the latest changes on the main branch. To pull the image and run an R Studio instance locally on localhost:8787 run:

docker pull ghcr.io/mapme-initiative/mapme-spatial-dev:1.0
docker run --rm -p 8787:8787 -e PASSWORD=supersecret ghcr.io/mapme-initiative/mapme-spatial-dev:1.0

We advise you to adapt to the new UI as early as possible and are also asking you to provide your feedback via dedictated issues.

As a reminder of the time schedule, we aim to send a new release to CRAN towards the end of April.

goergen95 commented 2 months ago

To ease the development process and the discoverability of the milestones, we slightly changes the process. Now, the milestones will be developed on a dev branch while they will be published on the main branch as early as possible. After a period of one month, new CRAN releases will be conducted. Above comments were adjusted to reflect the changed process.

goergen95 commented 2 months ago

{mapme.biodiversity} v0.6.0 has just been released and should be available in the coming days from CRAN. This release includes the updated user-interface for querying resources and indicators. The release notes contain additional information.

goergen95 commented 2 months ago

The latest changes of our second milestone Standardized Indicator Output and Serialization Options is ready for testing on the main branch.

Please revise NEWS.md for a quick overview of the proposed changes.

You can install the package from the main branch via:

remotes::install_github("mapme-initiative/mapme.biodiversity")

We also provide a ready-to-use docker image that is re-build every day with the latest changes on the main branch. To pull the image and run an R Studio instance locally on localhost:8787 run:

docker pull ghcr.io/mapme-initiative/mapme-spatial-dev:1.0
docker run --rm -p 8787:8787 -e PASSWORD=supersecret ghcr.io/mapme-initiative/mapme-spatial-dev:1.0

We advise you to adapt to the output format as early as possible and are also asking you to provide your feedback via dedictated issues.

As a reminder of the time schedule, we aim to send a new release to CRAN towards the end of May.

goergen95 commented 1 month ago

{mapme.biodiversity} v0.7.0 has just been released and should be available in the coming days from CRAN. This release includes standardized outputs for indicators as well as chunking approach that allows to supply assets of type 'MULTIPOLYGON'. The release notes contain additional information.

goergen95 commented 3 weeks ago

The latest changes of our third and last milestone Routing data I/O through GDAL is ready for testing on the main branch! :tada:

Please revise NEWS.md for a quick overview of the proposed changes.

You can install the package from the main branch via:

remotes::install_github("mapme-initiative/mapme.biodiversity")

We also provide a ready-to-use docker image that is re-build every day with the latest changes on the main branch. To pull the image and run an R Studio instance locally on localhost:8787 run:

docker pull ghcr.io/mapme-initiative/mapme-spatial-dev:1.2.0
docker run --rm -p 8787:8787 -e PASSWORD=supersecret ghcr.io/mapme-initiative/mapme-spatial-dev:1.2.0

We advise you to adapt to the output format as early as possible and are also asking you to provide your feedback via dedicated issues.

As a reminder of the time schedule, we aim to send a new release to CRAN towards the end of June.

goergen95 commented 5 days ago

{mapme.biodiversity} v0.8.0 has just been released and should be available in the coming days from CRAN. This release includes a GDAL based backend allowing to seamlessly integrate diverse cloud storage solutions into your workflows as well as calculation of indicators without downloading persistent data (only recommended for small portfolios) . The release notes contain additional information.