pik-piam / edgeTransport

A detailed transport sector model.
5 stars 15 forks source link

Partial matching in edgeTransport #292

Open 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q opened 1 week ago

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 week ago

Current input data processing logs sprawl with warnings for partial matching:

$ tar -Oxf /p/projects/rd3mod/inputdata/output_1.27/rev6.99_62eff8f7_remind.tgz ./diagnostics.log \
    | grep "partial match" \
    | sort \
    | uniq -c
    102 ~~~~~ WARNING: partial match of 'energyIntensity' to 'energyIntensityRaw'
    126 ~~~~~ WARNING: partial match of 'filter' to 'filterEntries'
     43 ~~~~~ WARNING: partial match of 'loadFactor' to 'loadFactorRaw'

which is bad for several reasons (in order of increasing severity):

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 week ago

Options

It appears that the 271 errors are caused by at most eight instances in the code:

$ grep -Eno '[[:alpha:]][[:alnum:]]*\$(energyIntensity|filter|loadFactor)\>' -- ./R/*.R 
./R/iterativeEDGETransport.R:94:helpers$filter
./R/toolApplyScenSpecLoadFactor.R:20:helpers$filter
./R/toolApplyScenSpecLoadFactor.R:26:helpers$filter
./R/toolCalculateInitialIncoCost.R:89:helpers$filter
./R/toolCombineCAPEXandOPEX.R:71:helpers$filter
./R/toolLoadInputs.R:19:mrtransportData$energyIntensity
./R/toolPrepareScenInputData.R:48:inputDataRaw$loadFactor
./R/toolPrepareScenInputData.R:59:inputDataRaw$energyIntensity

If you know that energyIntensity, filter, and loadFactor do not exist, and should always be energyIntensityRaw, filterEntries, and loadFactorRaw raw instead, you can fix them using

$ sed \
    -e 's/\([[:alpha:]][[:alnum:]]*\$energyIntensity\>\)/\1Raw/g' \
    -e 's/\([[:alpha:]][[:alnum:]]*\$filter\>\)/\1Entries/g' \
    -e 's/\([[:alpha:]][[:alnum:]]*\$loadFactor\>\)/\1Raw/g' \
    -i ./R/*.R

If however energyIntensity, filter, and loadFactor should exist (at least under certain circumstances), you can use

$ sed 's/\([[:alpha:]][[:alnum:]]*\)\$\(energyIntensity\|filter\|loadFactor\>\)/\1[["\2"]]/g' -i ./R/*.R

and then deal with the resulting errors. ($filter fill only match filterEntries if filter does not exist – so [["filter"]] fill return NULL and the subsequent code will break.)

johannah-pik commented 1 week ago

Hi @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q and @fbenke-pik,

this is not a bug but simply a bad choice of variable names. It does not throw a warning locally, that's why I didn't stumble across it during the development of new edget. I also have to admit that reading that one line in the R language rules, which talks about partial matching, unfortunately didn't sensitize me to it.

Can you may provide a bit more information on why it is so bad to have one variable named energyIntensityRaw and one that is called energyIntensity? And can you also provide some advice on choosing variable names that are intuitively understandable for two very similar variables? Coming up with so many names can be hard ;D

johannah-pik commented 1 week ago

I appreciate you efforts to get the input data processing warning free and I am always open to improve our code. On the other side I want to emphasize that old edget threw countless warnings no one ever cared about and locally we got rid of all of them :)

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 week ago

I, for one, did very much care about the number warnings in the input data preparation. I admit that I was neglectful in not commending for this improvement. Your work is very much appreciated.

On the matter of local warnings: set options(warnPartialMatchDollar = TRUE) (e.g. in your Sys.getenv('R_PROFILE_USER') file).


This is not about variable names. You can name your variables whatever you want. This is about a quirk of the R language that was grandfathered in at some point and is being kept for backwards compatibility, but is more of a liability than a feature. Suppose you have this list

inputDataRaw <- list(energyIntensityRaw = 13)
inputDataRaw['energyIntensity'] <- inputDataRaw[['energyIntensityRaw']] / 10

inputDataRaw
# $energyIntensityRaw
# [1] 13
# 
# $energyIntensity
# [1] 1.3

And then you operate on the energyIntensity element to do some calculation

inputDataRaw$energyIntensity ^ 2
# [1] 1.69

Works as intended.

Now suppose for some reason, the energyIntensity element is deleted somewhere else (this list is passed into the functions as a parameter after all):

inputDataRaw['energyIntensity'] <- NULL

Now the element you want to operate on does not exist anymore, which should throw an error. Instead, R merrily continuous to operate on a similarly named element, without telling you so

inputDataRaw$energyIntensity ^ 2
# [1] 169

with a different result.

From the outside (and for you in a couple of months when there is a bug somewhere in the input data generation and a submission deadline is pressing, and you would rather deal with something else), it is not easy to figure out whether energyIntensity should actually be called energyIntensityRaw, or energyIntensity should actually exist.

From your explanation, I understand that energyIntensity is a valid variable and should exist in the inputDataRaw list. If that is the case, then this is a bug, because it does not exist and R operates on the energyIntensityRaw element instead.

johannah-pik commented 1 week ago

Thanks for caring and making me aware of this! I did not know that R is doing this and sure that is a potential source of error. Fortunately, all the instances where this has happened have been harmless and only resulted in me not noticing the spelling error.
I fixed them all and now I don't get any warnings. @fbenke-pik do you think this is sufficient for now? I will also inform the other transport members that including options(warnPartialMatchDollar = TRUE) to the Rprofile is very helpful. And could RSE provide this information in the R Programming Standards?

fbenke-pik commented 1 week ago

Thank you for your help, Michaja. Super insightful.

I fixed them all and now I don't get any warnings. @fbenke-pik do you think this is sufficient for now?

Yes, for now that should be fine, I think. Can you link the corresponding PR here?

And could RSE provide this information in the R Programming Standards?

I updated the standards with a section, using Michajas example. It also tells you to set the options accordingly.

Most importantly: @fbenke-pik is working hard to get the input data warning-free, so that warnings are actually meaningful again and not ignored outright.

Yes, this is the plan. We want to make the inputdata log meaningful again and I will go through all the warnings sooner or later. I haven't started yet due to other pressing issues, but it it is high up on my list.

countless warnings no one ever cared about.

That used to be true, but I am caring about them now :)

johannah-pik commented 1 week ago

Merci! I will post the PR here..

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 week ago

We want to make the inputdata log meaningful again and I will go through all the warnings sooner or later. I haven't started yet due to other pressing issues, but it it is high up on my list.

Do you expect to get rid of

~~~~~ getConfig for option "sourcefolder" must not be used from within readSource! Access will be disabled soon!

before its three-year anniversary?

$ git blame -L 54,54 R/getConfig.R
c1373b9f (Jan Philipp Dietrich 2021-12-22 16:37:45 +0100 54)           warning("getConfig must not be used from within ", w, "! Access will be disabled soon!")

:rofl: