Closed 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q closed 8 months ago
codeCheck
misses this because the variable is called p37_specFEDem
instead of p37_specFeDem
. As I learned, it is the same for GAMS.
How to proceed @tscheypidi? Should I simply make the check for variable usage outside the module case insensitive or should we introduce a warning / error when several variable names are found that are the same to GAMS like in this example?
There is only one correct way to handle this.
GAMS is not case sensitive. This means that lower and upper case letters may be mixed freely but are treated identically.
@fbenke-pik given that having different versions of the same variable in the code can lead to various problems I would suggest to have a dedicated warning for these instances rather than making the other check case insensitive.
The easiest fix seems to be to make checkAppearance ignore case. This function gathers the occurrences of all declared variables in all modules and is the foundation for the error checking in this issue. If we can agree on this fix, this PR would solve this particular issue already.
Checking if declared variables appear in various versions in the code is surely a useful check. Even if GAMS allows any variable name version, we might not want to allow it, as it is error-prone (e.g. when doing search/replace during a refactoring). However, at first glance it probably will be a time-consuming check to search for all versions of each declared variable in the whole codebase, so it comes at a cost (assuming, we do a naive implementation and don't try to somehow build on the results from checkAppearance to make it more efficient). Is it worth implementing?
Even if GAMS allows any variable name version, we might not want to allow it, as it is error-prone (e.g. when doing search/replace during a refactoring).
@fbenke-pik I thought that the check should be rather easy to implement as we have the list of all parameters/variables already extracted in codeCheck
. Something like anyDuplicates(tolower(names))
should do it, or not?
@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q There is an important difference between "you can search case-insensitive" and "users search case-insensitive". The fact that you can circumvent that error does not mean that people will not run into this trap. And also depending on what you want to do or figure out in the code you do not necessarily run into a compilation error...(you might just miss an important line in the code and thereby being not able to understand another problem)
@fbenke-pik I thought that the check should be rather easy to implement as we have the list of all parameters/variables already extracted in
codeCheck
. Something likeanyDuplicates(tolower(names))
should do it, or not?
Yes, you have a list of all the declared parameters/variables extracted. But in this case p37_specFeDem
was declared (so it is in the list). And then in the code it was used in some places as p37_specFEDem
(which is not in the list).
So an algorithm probably has to count occurrences of all declared variables in all the code with and without case sensitive condition and then check if the count differs.
ah ok, I forgot that only the declarations are scanned. In that case it indeed looks like a bit more computational effort. @fbenke-pik what is your opinion on it?
It all depends on whether you view variables with different capitalization as a problem. As I don't work directly with GAMS / REMIND, i don't know if this is a problem in practice. Assuming you view it as a problem, we should add proper code a check, fix all current occurrences in REMIND and accept the increase in runtime.
This is a PR that implements such a check (so far only prints the variables) https://github.com/pik-piam/gms/pull/84
Please take a look to get an idea which variables are currently potentially affected in REMIND (I still need to check thoroughly for false positives)
Of course i can make it an optional check so each model team can decide individually, if they want to enforce this. @LaviniaBaumstark would you be willing to enforce this for REMIND?
I would be in favor of enforcing it. Users tend to search the code without considering the option that the same variable/parameter might be written with different capitalization and I do not see an advantage in allowing for this kind of flexibility. I guess all instances where the capitalization are used different are probably more mistakes than conscious decisions.
The issue I just see is the runtime of the check :(
I agree to Jan. But first we would need to fix it for REMIND before activating it :-)
If we can agree on this fix, this PR would solve this particular issue already.
This is not that easy as I initially thought. We would have to refactor all checks of codeCheck to make sure it sees GAMS vars as equal even with different capitalization.
@tscheypidi @LaviniaBaumstark In order to move forward, I need a binding decision how to proceed: 1) Enforce case sensitivity in codeCheck 2) Refactor codeCheck to view variables as equal regardless of capitalization just like GAMS does 3) Support both approaches, make it an option for codeCheck
I am in favor of 1., especially also in the light that a case insensitive check does not seem to be as easy to be implemented as initially thought.
I am in favor of 1., especially also in the light that a case insensitive check does not seem to be as easy to be implemented as initially thought.
To clarify, it is not necessarily hard, just a larger refactoring touching all checks. The core of codeCheck is an appearance matrix that tracks declared and not used variables times the realizations of all models. This matrix would have to be unified (e.g. all lowercase) to remove duplicates due to different capitalization. As all checks build on top of this matrix, I would have to check all the checks to make sure this does not break anything.
under this condition, I am also in favor of 1, but this also means that we will have the longer test, right?
under this condition, I am also in favor of 1, but this also means that we will have the longer test, right?
yes, the additional test takes 2 minutes. at first glance, i don't see an easy way to reduce complexity.
I would say this 2 minutes are o.k.
@fbenke-pik Concerning runtime I was wondering whether stringr
could help here to speed things up (already a dependency of gms
. I think you currently search the code twice which is probably that time consuming part. stringr
allows to search the code for occurrences of a string and to extract all these occurrences. If you do this kind of extraction you could just afterwards analyze these extracted objects which might potentially be faster and perhaps even more reliable. What do you think?
@fbenke-pik Concerning runtime I was wondering whether
stringr
could help here to speed things up (already a dependency ofgms
. I think you currently search the code twice which is probably that time consuming part.stringr
allows to search the code for occurrences of a string and to extract all these occurrences. If you do this kind of extraction you could just afterwards analyze these extracted objects which might potentially be faster and perhaps even more reliable. What do you think?
See my reply here: https://github.com/pik-piam/gms/pull/84#issuecomment-1924185429
Thanks for testing
As Jan and Lavinia decided to go with unifying capitalization, this is the PR to address this issue: https://github.com/pik-piam/gms/pull/84
I am working on a corresponding PR to REMIND that fixes all current instances that make the test fail: https://github.com/remindmodel/remind/pull/1542
In remind#1b90822,
codeCheck()
should complain aboutas
p37_specFEDem
should be exclusive to module37_industry
, but does not.