Closed 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q closed 5 months ago
Evaluation is required as the caching mechanism needs to know whether the mappings are still the same or not. I do agree that this is not ideal but I don't see any workaround here.
For one thing: the warning is still pointless, and should be removed in my opinion. The proof is in the pudding: either the mapping can be retrieved, or it cannot and toolGetMapping()
will throw an error. Warnings about mappings that cannot be found when one looks for them the wrong way are just spam, which @fbenke-pik tries to get rid of.
For another: I am puzzled how fingerprint()
handles these evaluation errors. Trying with mrremind:::calcIO()
(v0.179.2, so prior to pik-piam/mrremind#503) it seems that the missing mapping is simply ignored when calculating the finger print. Which begs the question: what is the point? The only sensible course to seems to fail finger printing and therefore not caching the function call.
Also, fingerprint()
straight-up ignores about half the components making up calcIO()
? My confidence in the cache mechanism is not improving.
library(mrremind)
# Loading required package: edgeTransport
# Loading required package: data.table
# data.table 1.15.4 using 1 threads (see ?getDTthreads). Latest news: r-datatable.com
# Loading required package: madrat
# Loading required package: magclass
# Loading required package: mrcommons
# Loading required package: mrdrivers
# Loading required package: mrfaocore
# Loading required package: mstools
# Loading required package: mrlandcore
madrat:::getCode('mrremind') -> code
# Warning message:
# In .warnNotfound(getMappings) :
# Mapping in mrremind:::calcIO not found!
attr(code, 'mappings')[names(attr(code, 'mappings')) == 'mrremind:::calcIO']
# $`mrremind:::calcIO`
# [1] "NOTFOUND"
(x <- attr(madrat:::fingerprint('calcIO', package = 'mrremind', details = TRUE), 'details'))
# EDGETransport
# "9084d579"
# IEA
# "1bbd11b9"
# JRC_IDEES
# "43f6dd94"
# /p/projects/rd3mod/inputdata/mappings/reportingVariables/Mapping_JRC_IDEES_REMIND_Industry.csv
# "b9a025d5"
# /p/projects/rd3mod/inputdata/mappings/reportingVariables/Mapping_JRC_IDEES_REMIND_ResCom.csv
# "acd04128"
# /p/projects/rd3mod/inputdata/mappings/reportingVariables/Mapping_JRC_IDEES_REMIND_Transport.csv
# "e9cc0831"
# UNKNOWN:::calc1stBioDem
# NA
# UNKNOWN:::calcGDP
# NA
# UNKNOWN:::calcIOEdgeBuildings
# NA
# UNKNOWN:::calcgenerateEDGEdata
# NA
# UNKNOWN:::readIEA
# NA
# UNKNOWN:::readJRC_IDEES
# NA
# UNKNOWN:::toolAggregate
# NA
# UNKNOWN:::toolCountryFill
# NA
# UNKNOWN:::toolGetMapping
# NA
# mrremind:::calcEDGETrData
# "f1b61714"
# mrremind:::calcIO
# "be248f2d"
# mrremind:::calcJRC_IDEES
# "8ed97d9b"
# mrremind:::readEDGETransport
# "8696d63a"
# mrremind:::tool_fix_IEA_data_for_Industry_subsectors
# "795fce2c"
# /p/projects/rd3mod/inputdata/mappings/regionmappingH12.csv
# "7a4b46da"
# /p/projects/rd3mod/inputdata/mappings/regional/regionmapping_21_EU11.csv
# "992d15fb"
# Warning messages:
# 1: In .warnNotfound(getMappings) :
# Mapping in mrremind:::calcIO not found!
# 2: In getMadratGraph(...) :
# Following functions could not be found in the scope of packages to be checked.: readMAgPIE->calcBiomassPrices,
# calcFAOLand->calcBiomassPrices, readPWT->calcCapital, calcGDPPast->calcCapital, calcGDP->calcCapital, calcGDPPast->calcCapTarget,
# calcGDP->calcCCScapacity, calcGDP->calcCementShare, readIEA->calcChemicalFeedstocksShare, calcGDP->calcChemicalFeedstocksShare,
# calcGDPPast->calcClinker_to_cement_ratio, calcGDP->calcCostsTradePeFinancial, readEDGAR->calcEconometricEmiParameter,
# calcPopulation->calcEconometricEmiParameter, calcGDP->calcEDGETransport, calcgenerateEDGEdata->calcEDGETrData,
# readEEA_EuropeanEnvironmentAgency->calcEEAGHGProjections, readEEA_EuropeanEnvironmentAgency->calcEffortSharingRefEmi,
# readMAgPIE->calcEmiAirPollLandUse, readMAgPIE->calcEmiCO2LandUse, readEDGAR->calcEmiFossilFuelExtr,
# calcHistEmissions->calcEmiFossilFuelExtr, readEDGAR->calcEmiMac1990, readEurostat->calcEmiMac1990, readEDGAR->calcEmiPollutantExo,
# readCEDS->calcEmiPollutantExo, readEEA_Europ [... truncated]
c(sum(grepl('UNKNOWN', names(x))), length(x))
# [1] 9 22
I also have some follow-up questions
mapping = "structuremappingIO_inputs.csv"
package = "mrremind"
mapping <- toolGetMapping(type = "sectoral", name = mapping, where = where, returnPathOnly = TRUE)
Otherwise, the hashes of any madrat function using that function directly or indirectly cannot be calculated correctly, because the mapping has hash NA?
I have seen cases like the fingerprint example of Michaja quite a lot when debugging mrremind and edgeTransport caching. Happens also when people write tool functions, but forget to declare the package as madrat package.
What would be the behaviour of fingerprint
if a contributing function or resource has the hash NA as in the example above? Will it simply be ignored?
Would it make sense to sound the alarm when a fingerprint cannot be calculated correctly because the hash of at least one of the contributing functions / sources return NA as a hash?
Yes, this formulation is not supported and thereby causes the warning.
If so, then that should be reflected somewhere.
foo <- function(where = NULL)
{
if (is.symbol(substitute(where)))
{
stop('Only literal arguments for `where` are supported.')
}
return(where)
}
foo('foo')
# [1] "foo"
x <- 'foo'
foo(x)
# Error in foo(x) : Only literal arguments for `where` are supported.
If so, then that should be reflected somewhere.
I agree with Michaja, if this actually leads to mappings being ignored in the caching logic, using arguments should be explicitly forbidden.
Isn't the warning about the mapping not covering this case?
You are right, it seems like the warning covers all the cases where a hash would produce NAs. So it turns out the warning is useful and we must pay attention to it when generating inputdata.
Nobody sees that warning.
$ tar -Oxf /p/projects/rd3mod/inputdata/output/rev6.75APT-2024-05-18_62eff8f7_remind.tgz ./diagnostics.log | sed -n '/WARNING: $/ { N; s/ \n//; }; /WARNING/ { s/~*//g; p }' | sort | uniq -c | sort -k 1nr,1
11640 WARNING: an argument will be fractionally recycled
6630 WARNING: collapsing to unique 'x' values
345 WARNING: error during factorisation of matrix (dgefa); singular matrix
345 WARNING: steady-state not reached
234 WARNING: no non-missing arguments to max; returning -Inf
78 WARNING: Column 'pkm' does not exist to remove
78 WARNING: Column 'weight' does not exist to remove
78 WARNING: getConfig for option "sourcefolder" must not be used from within readSource! Access will be disabled soon!
75 WARNING: Your dimnames in dim=3 contain duplicates! This might lead to erronous results and bad code performance. Please try to avoid duplicates in dimnames under all circumstances!
43 WARNING: 'getRegions<-' is deprecated.
3 WARNING: Data for following unknown country codes removed: ADO, CHI, ZAR, IMY, KSV, ROM, TMP, WBG
2 WARNING: calcOutput('IOEdgeBuildings', subtype = X), with X in (output_EDGE, output_EDGE_buildings) produces negative values, set to 0
2 WARNING: NAs introduced by coercion
1 WARNING: argument 'where' should be set when calling toolGetMapping from within a madrat function.
1 WARNING: c("Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.", i = "Please use `reframe()` instead.", i = "When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust accordingly.")
1 WARNING: Data for following unknown country codes removed: SCG - By using madrat::toolISOhistorical the data for the following countries can usually still be used: SCG
1 WARNING: Data returned by mrremind:::calcEmissionFactors(...) contains NAs
1 WARNING: Data returned by mrremind:::calcIndustry_Value_Added(...)
1 WARNING: Data returned by mrremind:::calcJRC_IDEES(...) contains NAs
1 WARNING: Duplicate entries found, only the last entry will be used (duplicate entries: GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.startYeBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.startValBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetYeBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetValBEV.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.startYeICE.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetYeICE.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.targetValICE.ptab4W, GLO|NOTIME|gdp_SDP_MC.gdp_SDP_MC.Mix4.ratioPHEV.ptab4W)!
1 WARNING: Expecting numeric in O7310 / R7310C15: got 'incl in gov. services'
…
1 WARNING: Expecting numeric in O7557 / R7557C15: got 'incl in gov. services'
1 WARNING: getConfig for option "globalenv" must not be used from within readSource! Access will be disabled soon!
1 WARNING: getConfig for option "packages" must not be used from within readSource! Access will be disabled soon!
1 WARNING:getConfig must not be used from within retrieveData! Access will be disabled soon!
1 WARNING: 'luscale::speed_aggregate' is deprecated.
1 WARNING: More than 50% missing values for: OAS
1 WARNING: Product/flow combinations not present in mapping added by
1 WARNING: setConfig must not be used from within retrieveData!
Also, I doubt anybody who did not diligently read this issue will be aware that Mapping in <some function> not found!
equals <some function> may have corrupted caches!
.
Just to clarify: I agree, it is not intuitive and the fact that this has been unnoticed for a long time now calls for a more explicit warning. I wouldn't mind throwing an error in the right places either.
For REMIND inputdata generation specifically, we (RSE) will be taking all warnings seriously in the future, so at least I will notice it. But of course, this is not good enough from the perspective of madrat as a package used by users outside our group.
https://github.com/pik-piam/madrat/blob/c4ba674c467269b9a33104fe436c4c4402c26487/R/getCode.R#L57
via pik-piam/mrremind#503 Also,
toolGetMapping()
has perfectly good error messages. https://github.com/pik-piam/madrat/blob/c4ba674c467269b9a33104fe436c4c4402c26487/R/toolGetMapping.R#L44