pik-piam / lucode2

A collection of tools which allow to manipulate and analyze code.
Other
0 stars 14 forks source link

manipulateConfig: doesn't handle GAMS comments properly #121

Open mikapfl opened 1 year ago

mikapfl commented 1 year ago

If given something like this:

parameter
  cm_FlexTaxFeedback          "switch deciding whether flexibility tax feedback on buildlings and industry electricity prices is on"
*** cm_FlexTaxFeedback, switches on feedback of flexibility tax on buildings/industry.
*** That is, electricity price decrease for electrolysis has to be matched by electrictiy price increase in buildings/taxes. 
*** This switch only has an effect if the flexibility tax is on by cm_flex_tax set to 1.
;
  cm_FlexTaxFeedback = 0; !! def 0

and tasked with changing cm_FlexTaxFeedback to 1, manipulateConfig will detect the / in the comment inside the parameter definition and will change everything between the slashes so the file ends up as:

parameter
  cm_FlexTaxFeedback          "switch deciding whether flexibility tax feedback on buildlings and industry electricity prices is on"
*** cm_FlexTaxFeedback, switches on feedback of flexibility tax on buildings/ 1 /taxes. 
*** This switch only has an effect if the flexibility tax is on by cm_flex_tax set to 1.
;
  cm_FlexTaxFeedback = 1; !! def 0

i.e. swallowing a good part of the comments.

Within the current framework of manipulateConfig this is very hard to solve. manipulateConfig uses regular expressions, which are best for context-free grammars, but to properly detect what is happening, we have to detect two contexts - first, the definition context, which in the example goes from parameter to the first ;, but can also span multiple definitions, and then, second, the comment context within the definition context, which spans from *** to the next end of line. While I think it should be theoretically possible to solve this in perl-compatible regular expressions (they are turing-complete, after all), practically, the regular expressions are completely unreadable already, and adding another context detection to them is practically impossible. For the record, this is the regex which detects parameter definitions currently, which does not handle comments properly:

paste0("((\\n|^)[\\t ]*(scalar|parameter|set|)s?[\\t ]*", key, "(|\\([^\\)]*\\))(/|[\\t ]+(\"[^\"]*\"|)[^\"/;]*/))[^/]*")
mikapfl commented 1 year ago

I'm not quite sure, where to go from here. I see the following solutions:

  1. Carry on, forbid / in comments in GAMS. We should probably add some checks to codeCheck or so to enforce this. This can be a template for similar, future problems. If stuff like this crops up because GAMS features are used that our regex-based approach is not equipped to handle, we forbid them and enforce that via codeCheck (since codeCheck is also based on regex, this might sometimes be a bit difficult, but in general, it is easier to detect stuff that you don't want to allow than properly parsing it).
  2. Carry on, add an ad-hoc parser for GAMS definitions in R to manipulateConfig. R is a turing-complete language, it is not difficult to track if you are in one or multiple context when parsing a GAMS file line-by-line in R. This can also be a template for similar, future problems. If something gets too difficult in regex, we instead convert to line-by-line parsing in R, which is much more expressive.
  3. Abandon writing/modifying GAMS from R. Meta-Programming like this, where you write program code (R) to write program code (GAMS) is usually difficult to debug. Instead, use one of the supported methods to get data into GAMS from R, like GAMSTRANSFER or GAMS connect with CSV files or via GDX, or write a GAMS connect agent in python which reads in a YAML file.
  4. Something else?

Sorry to make a big thing out of this again, but given the amount of work we have lately (not only connected to rewriting the start scripts) which revolves around parsing GAMS code with regex, I can't help but think we are busy fixing an ultimately unnecessary approach given the other, saner, options. Sure, GAMSTRANSFER doesn't support $setglobal, but if we can reduce the problem that needs to be solved with regexes to the relatively simple $setglobal instead of parsing set and parameter definitions, I would already be quite happy.

orichters commented 1 year ago