pik-piam / madrat

R package | May All Data be Reproducible and Transparent (MADRaT)
https://cran.r-project.org/web/packages/madrat/index.html
Other
14 stars 36 forks source link

Trying to guess how toolGetMapping() is run leads to false negatives in getCode() #209

Closed 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q closed 4 months ago

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 4 months ago

https://github.com/pik-piam/madrat/blob/c4ba674c467269b9a33104fe436c4c4402c26487/R/getCode.R#L57

In my opinion it is not helpful to try to reason about code by evaluating it […] separate from its execution environment. All instances of .warnNotfound(getMappings) I found [for everything loaded via mrremind] (four times mredgebuildings, and mrremind::calcIO()) are false negatives where the mapping name is composited. This encourages repetitive, therefore redundant, therefore error-prone code.

via pik-piam/mrremind#503 Also, toolGetMapping() has perfectly good error messages. https://github.com/pik-piam/madrat/blob/c4ba674c467269b9a33104fe436c4c4402c26487/R/toolGetMapping.R#L44

tscheypidi commented 4 months ago

Evaluation is required as the caching mechanism needs to know whether the mappings are still the same or not. I do agree that this is not ideal but I don't see any workaround here.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 4 months ago

For one thing: the warning is still pointless, and should be removed in my opinion. The proof is in the pudding: either the mapping can be retrieved, or it cannot and toolGetMapping() will throw an error. Warnings about mappings that cannot be found when one looks for them the wrong way are just spam, which @fbenke-pik tries to get rid of.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 4 months ago

For another: I am puzzled how fingerprint() handles these evaluation errors. Trying with mrremind:::calcIO() (v0.179.2, so prior to pik-piam/mrremind#503) it seems that the missing mapping is simply ignored when calculating the finger print. Which begs the question: what is the point? The only sensible course to seems to fail finger printing and therefore not caching the function call. Also, fingerprint() straight-up ignores about half the components making up calcIO()? My confidence in the cache mechanism is not improving.

library(mrremind)
# Loading required package: edgeTransport
# Loading required package: data.table
# data.table 1.15.4 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
# Loading required package: madrat
# Loading required package: magclass
# Loading required package: mrcommons
# Loading required package: mrdrivers
# Loading required package: mrfaocore
# Loading required package: mstools
# Loading required package: mrlandcore

madrat:::getCode('mrremind') -> code
# Warning message:
# In .warnNotfound(getMappings) :
#   Mapping in mrremind:::calcIO not found!

attr(code, 'mappings')[names(attr(code, 'mappings')) == 'mrremind:::calcIO']
# $`mrremind:::calcIO`
# [1] "NOTFOUND"

(x <- attr(madrat:::fingerprint('calcIO', package = 'mrremind', details = TRUE), 'details'))
#                                                                                   EDGETransport 
#                                                                                      "9084d579" 
#                                                                                             IEA 
#                                                                                      "1bbd11b9" 
#                                                                                       JRC_IDEES 
#                                                                                      "43f6dd94" 
#  /p/projects/rd3mod/inputdata/mappings/reportingVariables/Mapping_JRC_IDEES_REMIND_Industry.csv 
#                                                                                      "b9a025d5" 
#    /p/projects/rd3mod/inputdata/mappings/reportingVariables/Mapping_JRC_IDEES_REMIND_ResCom.csv 
#                                                                                      "acd04128" 
# /p/projects/rd3mod/inputdata/mappings/reportingVariables/Mapping_JRC_IDEES_REMIND_Transport.csv 
#                                                                                      "e9cc0831" 
#                                                                         UNKNOWN:::calc1stBioDem 
#                                                                                              NA 
#                                                                               UNKNOWN:::calcGDP 
#                                                                                              NA 
#                                                                   UNKNOWN:::calcIOEdgeBuildings 
#                                                                                              NA 
#                                                                  UNKNOWN:::calcgenerateEDGEdata 
#                                                                                              NA 
#                                                                               UNKNOWN:::readIEA 
#                                                                                              NA 
#                                                                         UNKNOWN:::readJRC_IDEES 
#                                                                                              NA 
#                                                                         UNKNOWN:::toolAggregate 
#                                                                                              NA 
#                                                                       UNKNOWN:::toolCountryFill 
#                                                                                              NA 
#                                                                        UNKNOWN:::toolGetMapping 
#                                                                                              NA 
#                                                                       mrremind:::calcEDGETrData 
#                                                                                      "f1b61714" 
#                                                                               mrremind:::calcIO 
#                                                                                      "be248f2d" 
#                                                                        mrremind:::calcJRC_IDEES 
#                                                                                      "8ed97d9b" 
#                                                                    mrremind:::readEDGETransport 
#                                                                                      "8696d63a" 
#                                            mrremind:::tool_fix_IEA_data_for_Industry_subsectors 
#                                                                                      "795fce2c" 
#                                      /p/projects/rd3mod/inputdata/mappings/regionmappingH12.csv 
#                                                                                      "7a4b46da" 
#                        /p/projects/rd3mod/inputdata/mappings/regional/regionmapping_21_EU11.csv 
#                                                                                      "992d15fb" 
# Warning messages:
# 1: In .warnNotfound(getMappings) :
#   Mapping in mrremind:::calcIO not found!
# 2: In getMadratGraph(...) :
#   Following functions could not be found in the scope of packages to be checked.: readMAgPIE->calcBiomassPrices,
# calcFAOLand->calcBiomassPrices, readPWT->calcCapital, calcGDPPast->calcCapital, calcGDP->calcCapital, calcGDPPast->calcCapTarget,
# calcGDP->calcCCScapacity, calcGDP->calcCementShare, readIEA->calcChemicalFeedstocksShare, calcGDP->calcChemicalFeedstocksShare,
# calcGDPPast->calcClinker_to_cement_ratio, calcGDP->calcCostsTradePeFinancial, readEDGAR->calcEconometricEmiParameter,
# calcPopulation->calcEconometricEmiParameter, calcGDP->calcEDGETransport, calcgenerateEDGEdata->calcEDGETrData,
# readEEA_EuropeanEnvironmentAgency->calcEEAGHGProjections, readEEA_EuropeanEnvironmentAgency->calcEffortSharingRefEmi,
# readMAgPIE->calcEmiAirPollLandUse, readMAgPIE->calcEmiCO2LandUse, readEDGAR->calcEmiFossilFuelExtr,
# calcHistEmissions->calcEmiFossilFuelExtr, readEDGAR->calcEmiMac1990, readEurostat->calcEmiMac1990, readEDGAR->calcEmiPollutantExo,
# readCEDS->calcEmiPollutantExo, readEEA_Europ [... truncated]

c(sum(grepl('UNKNOWN', names(x))), length(x))
# [1]  9 22
fbenke-pik commented 4 months ago

I also have some follow-up questions

  1. Does that mean that we must not access any mappings as follows
mapping = "structuremappingIO_inputs.csv"
package = "mrremind"
mapping <- toolGetMapping(type = "sectoral", name = mapping, where = where, returnPathOnly = TRUE)

Otherwise, the hashes of any madrat function using that function directly or indirectly cannot be calculated correctly, because the mapping has hash NA?


I have seen cases like the fingerprint example of Michaja quite a lot when debugging mrremind and edgeTransport caching. Happens also when people write tool functions, but forget to declare the package as madrat package.

  1. What would be the behaviour of fingerprint if a contributing function or resource has the hash NA as in the example above? Will it simply be ignored?

  2. Would it make sense to sound the alarm when a fingerprint cannot be calculated correctly because the hash of at least one of the contributing functions / sources return NA as a hash?

tscheypidi commented 4 months ago
  1. Yes, this formulation is not supported and thereby causes the warning.
  2. If I am not mistaken it ignores the mapping
  3. Isn't the warning about the mapping not covering this case?
0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 4 months ago

Yes, this formulation is not supported and thereby causes the warning.

If so, then that should be reflected somewhere.

foo <- function(where = NULL)
{
    if (is.symbol(substitute(where)))
    {
        stop('Only literal arguments for `where` are supported.')
    }

    return(where)
}

foo('foo')
# [1] "foo"

x <- 'foo'
foo(x)
# Error in foo(x) : Only literal arguments for `where` are supported.
fbenke-pik commented 4 months ago

If so, then that should be reflected somewhere.

I agree with Michaja, if this actually leads to mappings being ignored in the caching logic, using arguments should be explicitly forbidden.

fbenke-pik commented 4 months ago

Isn't the warning about the mapping not covering this case?

You are right, it seems like the warning covers all the cases where a hash would produce NAs. So it turns out the warning is useful and we must pay attention to it when generating inputdata.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 4 months ago

Nobody sees that warning.

$ tar -Oxf /p/projects/rd3mod/inputdata/output/rev6.75APT-2024-05-18_62eff8f7_remind.tgz ./diagnostics.log | sed -n '/WARNING: $/ { N; s/ \n//; }; /WARNING/ { s/~*//g; p }' | sort | uniq -c | sort -k 1nr,1
  11640  WARNING: an argument will be fractionally recycled
   6630  WARNING: collapsing to unique 'x' values
    345  WARNING: error during factorisation of matrix (dgefa);         singular matrix
    345  WARNING: steady-state not reached
    234  WARNING: no non-missing arguments to max; returning -Inf
     78  WARNING: Column 'pkm' does not exist to remove
     78  WARNING: Column 'weight' does not exist to remove
     78  WARNING: getConfig for option "sourcefolder" must not be used from within readSource! Access will be disabled soon!
     75  WARNING: Your dimnames in dim=3 contain duplicates! This might lead to erronous results and bad code performance. Please try to avoid duplicates in dimnames under all circumstances!
     43  WARNING: 'getRegions<-' is deprecated.
      3  WARNING: Data for following unknown country codes removed: ADO, CHI, ZAR, IMY, KSV, ROM, TMP, WBG
      2  WARNING: calcOutput('IOEdgeBuildings', subtype = X), with X in (output_EDGE, output_EDGE_buildings) produces negative values, set to 0
      2  WARNING: NAs introduced by coercion
      1  WARNING: argument 'where' should be set when calling toolGetMapping from within a madrat function.
      1  WARNING: c("Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.", i = "Please use `reframe()` instead.", i = "When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust accordingly.")
      1  WARNING: Data for following unknown country codes removed: SCG - By using madrat::toolISOhistorical the data for the following countries can usually still be used: SCG
      1  WARNING: Data returned by mrremind:::calcEmissionFactors(...) contains NAs
      1  WARNING: Data returned by mrremind:::calcIndustry_Value_Added(...)
      1  WARNING: Data returned by mrremind:::calcJRC_IDEES(...) contains NAs
      1  WARNING: Duplicate entries found, only the last entry will be used (duplicate entries: GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.startYeBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.startValBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetYeBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetValBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.startYeICE.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetYeICE.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetValICE.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.ratioPHEV.ptab4W)!
      1  WARNING: Expecting numeric in O7310 / R7310C15: got 'incl in gov. services'
…
      1  WARNING: Expecting numeric in O7557 / R7557C15: got 'incl in gov. services'
      1  WARNING: getConfig for option "globalenv" must not be used from within readSource! Access will be disabled soon!
      1  WARNING: getConfig for option "packages" must not be used from within readSource! Access will be disabled soon!
      1 WARNING:getConfig must not be used from within retrieveData! Access will be disabled soon!
      1  WARNING: 'luscale::speed_aggregate' is deprecated.
      1  WARNING: More than 50% missing values for: OAS
      1  WARNING: Product/flow combinations not present in mapping added by 
      1 WARNING: setConfig must not be used from within retrieveData!
0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 4 months ago

Also, I doubt anybody who did not diligently read this issue will be aware that Mapping in <some function> not found! equals <some function> may have corrupted caches!.

fbenke-pik commented 4 months ago

Just to clarify: I agree, it is not intuitive and the fact that this has been unnoticed for a long time now calls for a more explicit warning. I wouldn't mind throwing an error in the right places either.

For REMIND inputdata generation specifically, we (RSE) will be taking all warnings seriously in the future, so at least I will notice it. But of course, this is not good enough from the perspective of madrat as a package used by users outside our group.