mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/dev
GNU General Public License v3.0
33 stars 7 forks source link

Updates write/read portfolio to serialze to/from two-table geopackage #294

Closed goergen95 closed 3 months ago

goergen95 commented 3 months ago

FYI @karpfen, as discussed yesterday this PR re-introduces read_portfolio() and allows us now to serialize to GPKG with two tables (metadata and indicators) and to seamlessly re-construct an sf object from such files. One could serialize to other formats with combining portfolio_long() or portfolio_wide() with st_write(), but we do not supply routines to re-construct the portfolio object from such files.

Would you be able to test this quickly? I would like to include this in the next CRAN release later this week.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 83.33333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 85.08%. Comparing base (bb39f42) to head (6cba567).

Files Patch % Lines
R/portfolio.R 83.33% 5 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #294 +/- ## ========================================== - Coverage 85.17% 85.08% -0.10% ========================================== Files 50 50 Lines 2239 2259 +20 ========================================== + Hits 1907 1922 +15 - Misses 332 337 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

karpfen commented 3 months ago

Cool, I'll have a look!

karpfen commented 3 months ago

@goergen95: Very cool functionality! I just had a go and it all works in general, but I found some potential issues:

  1. Reading and writing always plots the messages produced by st_write, regardless of the mapme_options(verbose) setting; e.g.

    > mapme_options(verbose = FALSE)
    > write_portfolio(aoi, "pf.gpkg")
    Writing layer `metadata' to data source `pf.gpkg' using driver `GPKG'
    Writing 6 features with 2 fields and geometry type Multi Polygon.
    Updating layer `indicators' to data source `pf.gpkg' using driver `GPKG'
    Writing 7 features with 6 fields without geometries.
  2. read_portfolio() produces a warning message that may be confusing. This is probably produced by reading the geometry-less table, but to the user it seems like there's something wrong even though all is working well:

    read_portfolio("pf.gpkg")
    Reading layer `metadata' from data source `C:\R\mapme.biodiversity\pf.gpkg' using driver `GPKG'
    Simple feature collection with 6 features and 2 fields
    Geometry type: MULTIPOLYGON
    Dimension:     XY
    Bounding box:  xmin: -100.2395 ymin: 15.9626 xmax: -97.10138 ymax: 19.56647
    Geodetic CRS:  WGS 84
    Reading layer `indicators' from data source `C:\R\mapme.biodiversity\pf.gpkg' using driver `GPKG'
    Warning message:
    no simple feature geometries present: returning a data.frame or tbl_df 
  3. If the portfolio result contains NULL results, read_portfolio fills it with NA instead

    
    aoi <- sf::read_sf("C:/R/mapme.ETL/biodiversity_indicators/mexico_all_indicators/mexico.gpkg") %>%
    dplyr::select(WDPAID)

aoi <- aoi[5:6, ] %>% get_resources( get_gmw(years = c(2015, 2016)) ) %>% calc_indicators( calc_mangroves_area() )

write_portfolio(aoi, "pf.gpkg") aoi2 <- read_portfolio("pf.gpkg")


When printing the two you can see the difference:
```r
all(st_as_text(aoi$geom) ==  st_as_text(aoi2$geom)) # TRUE

> aoi$mangroves_area
[[1]]
NULL

[[2]]
# A tibble: 2 × 4
  datetime   variable  unit  value
  <date>     <chr>     <chr> <dbl>
1 2015-01-01 mangroves ha    2584.
2 2016-01-01 mangroves ha    2547.

> aoi2$mangroves_area
[[1]]
# A tibble: 1 × 4
  datetime variable unit  value
  <date>   <chr>    <chr> <dbl>
1 NA       NA       NA       NA

[[2]]
# A tibble: 2 × 4
  datetime   variable  unit  value
  <date>     <chr>     <chr> <dbl>
1 2015-01-01 mangroves ha    2584.
2 2016-01-01 mangroves ha    2547.

to me, points 1 and 2 are not super critical, but point 3 potentially messes with workflows down the line. What do you think?

goergen95 commented 3 months ago

Thanks for taking the time to review!

Concerning 1:

... can be used to supply arguments to st_write() (as documented). Thus write_portfolio(aoi, "pf.gpkg", quiet = TRUE) would silence the messages.

Concerning 2:

Yep, maybe we could try to catch that and not print it. We would, however, need to print other warnings that might occur.

Concerning 3:

Totally agree that this is critical. I just found out about tidyr::unnest's keep_empty argument. Maybe we can still leverage that, do another sweep over the data and replace tibbles with only NAs with NULL?

goergen95 commented 3 months ago

Ok, easiest way seems to be to use write_sf() and read_sf(). This should take care of the verbose output.

goergen95 commented 3 months ago

Ok, I just updated the PR according to your suggestions. Would you have another look?

karpfen commented 3 months ago

Cool, thanks for the fixes! I couldn't find any more problematic cases, so merging now.