codeCheck() misses interface violation

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 9 months ago

In remind#1b90822, codeCheck() should complain about

  + sum(tePrc2opmoPrc(tePrc,opmoPrc)$(p37_specFEDem("2005",regi,enty,tePrc,opmoPrc) gt 0.),
      p37_specFEDem("2005",regi,enty,tePrc,opmoPrc)

as p37_specFEDem should be exclusive to module 37_industry, but does not.

fbenke-pik commented 9 months ago

codeCheck misses this because the variable is called p37_specFEDem instead of p37_specFeDem. As I learned, it is the same for GAMS.

How to proceed @tscheypidi? Should I simply make the check for variable usage outside the module case insensitive or should we introduce a warning / error when several variable names are found that are the same to GAMS like in this example?

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 9 months ago

There is only one correct way to handle this.

GAMS is not case sensitive. This means that lower and upper case letters may be mixed freely but are treated identically.

[↑]

tscheypidi commented 9 months ago

@fbenke-pik given that having different versions of the same variable in the code can lead to various problems I would suggest to have a dedicated warning for these instances rather than making the other check case insensitive.

fbenke-pik commented 9 months ago

The easiest fix seems to be to make checkAppearance ignore case. This function gathers the occurrences of all declared variables in all modules and is the foundation for the error checking in this issue. If we can agree on this fix, this PR would solve this particular issue already.

Checking if declared variables appear in various versions in the code is surely a useful check. Even if GAMS allows any variable name version, we might not want to allow it, as it is error-prone (e.g. when doing search/replace during a refactoring). However, at first glance it probably will be a time-consuming check to search for all versions of each declared variable in the whole codebase, so it comes at a cost (assuming, we do a naive implementation and don't try to somehow build on the results from checkAppearance to make it more efficient). Is it worth implementing?

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 9 months ago

Even if GAMS allows any variable name version, we might not want to allow it, as it is error-prone (e.g. when doing search/replace during a refactoring).

That is why you can search case-insensitive.
Even if one does not do that, it will only result in a compilation error and therefore be captured easily and early.

tscheypidi commented 9 months ago

@fbenke-pik I thought that the check should be rather easy to implement as we have the list of all parameters/variables already extracted in codeCheck. Something like anyDuplicates(tolower(names)) should do it, or not?

tscheypidi commented 9 months ago

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q There is an important difference between "you can search case-insensitive" and "users search case-insensitive". The fact that you can circumvent that error does not mean that people will not run into this trap. And also depending on what you want to do or figure out in the code you do not necessarily run into a compilation error...(you might just miss an important line in the code and thereby being not able to understand another problem)

fbenke-pik commented 9 months ago

@fbenke-pik I thought that the check should be rather easy to implement as we have the list of all parameters/variables already extracted in codeCheck. Something like anyDuplicates(tolower(names)) should do it, or not?

Yes, you have a list of all the declared parameters/variables extracted. But in this case p37_specFeDem was declared (so it is in the list). And then in the code it was used in some places as p37_specFEDem (which is not in the list).

So an algorithm probably has to count occurrences of all declared variables in all the code with and without case sensitive condition and then check if the count differs.

tscheypidi commented 9 months ago

ah ok, I forgot that only the declarations are scanned. In that case it indeed looks like a bit more computational effort. @fbenke-pik what is your opinion on it?

fbenke-pik commented 9 months ago

It all depends on whether you view variables with different capitalization as a problem. As I don't work directly with GAMS / REMIND, i don't know if this is a problem in practice. Assuming you view it as a problem, we should add proper code a check, fix all current occurrences in REMIND and accept the increase in runtime.

This is a PR that implements such a check (so far only prints the variables) https://github.com/pik-piam/gms/pull/84

Please take a look to get an idea which variables are currently potentially affected in REMIND (I still need to check thoroughly for false positives)

fbenke-pik commented 9 months ago

Of course i can make it an optional check so each model team can decide individually, if they want to enforce this. @LaviniaBaumstark would you be willing to enforce this for REMIND?

tscheypidi commented 9 months ago

I would be in favor of enforcing it. Users tend to search the code without considering the option that the same variable/parameter might be written with different capitalization and I do not see an advantage in allowing for this kind of flexibility. I guess all instances where the capitalization are used different are probably more mistakes than conscious decisions.

The issue I just see is the runtime of the check :(

LaviniaBaumstark commented 9 months ago

I agree to Jan. But first we would need to fix it for REMIND before activating it :-)

fbenke-pik commented 9 months ago

If we can agree on this fix, this PR would solve this particular issue already.

This is not that easy as I initially thought. We would have to refactor all checks of codeCheck to make sure it sees GAMS vars as equal even with different capitalization.

@tscheypidi @LaviniaBaumstark In order to move forward, I need a binding decision how to proceed: 1) Enforce case sensitivity in codeCheck 2) Refactor codeCheck to view variables as equal regardless of capitalization just like GAMS does 3) Support both approaches, make it an option for codeCheck

tscheypidi commented 9 months ago

I am in favor of 1., especially also in the light that a case insensitive check does not seem to be as easy to be implemented as initially thought.

fbenke-pik commented 9 months ago

I am in favor of 1., especially also in the light that a case insensitive check does not seem to be as easy to be implemented as initially thought.

To clarify, it is not necessarily hard, just a larger refactoring touching all checks. The core of codeCheck is an appearance matrix that tracks declared and not used variables times the realizations of all models. This matrix would have to be unified (e.g. all lowercase) to remove duplicates due to different capitalization. As all checks build on top of this matrix, I would have to check all the checks to make sure this does not break anything.

LaviniaBaumstark commented 9 months ago

under this condition, I am also in favor of 1, but this also means that we will have the longer test, right?

fbenke-pik commented 9 months ago

under this condition, I am also in favor of 1, but this also means that we will have the longer test, right?

yes, the additional test takes 2 minutes. at first glance, i don't see an easy way to reduce complexity.

LaviniaBaumstark commented 9 months ago

I would say this 2 minutes are o.k.

tscheypidi commented 9 months ago

@fbenke-pik Concerning runtime I was wondering whether stringr could help here to speed things up (already a dependency of gms. I think you currently search the code twice which is probably that time consuming part. stringr allows to search the code for occurrences of a string and to extract all these occurrences. If you do this kind of extraction you could just afterwards analyze these extracted objects which might potentially be faster and perhaps even more reliable. What do you think?

fbenke-pik commented 9 months ago

@fbenke-pik Concerning runtime I was wondering whether stringr could help here to speed things up (already a dependency of gms. I think you currently search the code twice which is probably that time consuming part. stringr allows to search the code for occurrences of a string and to extract all these occurrences. If you do this kind of extraction you could just afterwards analyze these extracted objects which might potentially be faster and perhaps even more reliable. What do you think?

See my reply here: https://github.com/pik-piam/gms/pull/84#issuecomment-1924185429

tscheypidi commented 9 months ago

Thanks for testing

fbenke-pik commented 9 months ago

As Jan and Lavinia decided to go with unifying capitalization, this is the PR to address this issue: https://github.com/pik-piam/gms/pull/84

I am working on a corresponding PR to REMIND that fixes all current instances that make the test fail: https://github.com/remindmodel/remind/pull/1542

pik-piam / gms

codeCheck() misses interface violation #82