pik-piam / madrat

R package | May All Data be Reproducible and Transparent (MADRaT)
https://cran.r-project.org/web/packages/madrat/index.html
Other
14 stars 35 forks source link

madrat caching dependent on attached packages #219

Closed 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q closed 3 hours ago

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 3 hours ago

This bug does not affect input data generation currently, but has the potential to screw it up in intractable ways.

For context, see pik-piam/mrremind#573, pik-piam/GDPuc#38 and this discussion.

Two facts can conspire to break the madrat caching for input data preparation.

  1. Packages use
    .onAttach <- function(libname, pkgname) {
       madrat::madratAttach(pkgname)
    }

    to be discovered by madrat and have their functions included in the fingerprinting for cache invalidation.

  2. The RSE R Programming Standards state that

    in general package::fun() should be used instead of roxygen's @importFrom

Using plausible changes to the mrdriver package [↑] as an example, GDPuc is loaded, but never attached. (So this is wrong.) Consequently, madrat caching ingnores toolConvertGDP() in the fingerprinting, possibly leading to corrupted caches.

devtools::load_all()

fp <- madrat:::fingerprint('calcGDP', details = TRUE)
# Warning messages:
# …
# 6: In getMadratGraph(...) :
#   Following functions could not be found in the scope of packages to be checked.: 
#    calcEdgeTransportSA->readEDGETransport, toolConvertGDP->calcGDP, toolConvertGDP->calcGDPpc, toolConvertGDP->calcInternalGDPFutureSSP2EU, toolConvertGDP->calcInternalGDPFutureSSPs, toolConvertSingle->calcInternalGDPFutureSSPs, toolConvertGDP->calcInternalGDPMI, toolConvertGDP->calcInternalGDPPastEurostat, toolConvertGDP->calcInternalGDPPastWDI
#   Please make sure that they exist and adjust the scope of packages accordingly!

as.vector(fp)
# [1] "32785696"

attr(fp, 'details')[grepl('Convert', names(attr(fp, 'details')))]
#      UNKNOWN:::toolConvertGDP    UNKNOWN:::toolConvertSingle mrdrivers:::toolGeneralConvert 
#                            NA                             NA                     "5d6a17ee" 

Yes, there is a warning. But the warnings of the input data generation are ignored as a matter of policy, and truncated to 50 in any case.

Actually attaching, not only loading GDPuc yields different result:

library(GDPuc)

fp2 <- madrat:::fingerprint('calcGDP', details = TRUE)

as.vector(fp2)
# [1] "7c09e366"

attr(fp2, 'details')[grepl('Convert', names(attr(fp2, 'details')))]
#        GDPuc:::toolConvertGDP      GDPuc:::toolConvertSingle mrdrivers:::toolGeneralConvert 
#                    "84d0659f"                     "73e1e1bf"                     "5d6a17ee" 

This is not an issue currently for input data generation, as GDPuc is imported explicitly in several packages (mrcommons, mrfactors, mrfaocore, mrland, mrremind, mrvalidation, piamInterfaces). But should that be changed, caches will rot.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 3 hours ago

As @fbenke-pik discovered, the situation is somewhat different. madrat only fingerprints functions if their package is registered through madratAttach(), which all packages do during attachment, and packages are only attached when they are Depends dependencies of packages that are being attached themselves. Otherwise they are only loaded. Two possible solutions present themselves:

  1. Make all "madrat-verse" packages Depends dependencies, with the ensuing bloated search paths.
  2. Have the packages call madratAttach() during .onLoad().
tscheypidi commented 3 hours ago

The first is actually the policy we currently have in place. I think the 2nd solution does not work as madratAttach() in some cases needs to happen before .onLoad is being run.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 hour ago

The first is actually the policy we currently have in place.

As in "an undocumented policy in contradiction to the stated 'R Language Rules'"?

I think the 2nd solution does not work as madratAttach() in some cases needs to happen before .onLoad is being run.

.onLoad() runs before .onAttach().