grammarware / software-evolution

Software Evolution
MIT License
1 stars 0 forks source link

Question about ambiguity in case insensitivity and keywords #23

Closed Michael-Janssen-dev closed 3 months ago

Michael-Janssen-dev commented 4 months ago

It is still unclear to me how the ambiguity around case sensitivity is resolved. The canvas page states 'in the face of ambiguity, words written in uppercase resolve to keywords, otherwise to field names'. This implies to me that the parser should find a path in the parse forest which has the least number of edits, i.e. keywords written in lowercase, and conflicting identifiers in uppercase.

If we apply this theory to the examples, it quickly falls apart. Take example 2: ADD A to B ADD C TO D ADD E TO F. Here, there is one 'imposter', the second 'ADD' downgraded from a keyword to an identifier. So this route would have a score of 1. However, If we parse the sentence differently: ADD A to B ADD C TO D ADD E TO F There is also just one 'imposter', the 'to' upgraded to a keyword. This route would also have a score of 1, and therefore there is ambiguity. The example, however, says that only the first parse is valid.

If we take a look at the last example of the explainer: image First of all, the second and fourth lines are the same. Now if we plot out all up and downgrades of each line (there is an implicit upgrade for the last 'to' in all 5 parses):

  1. Downgrade 'ADD', downgrade 'ADD'
  2. Downgrade 'ADD', upgrade 'to'
  3. Upgrade 'to', downgrade 'ADD'
  4. (same line as 2.)
  5. Upgrade 'to', upgrade 'to'

The theory would therefore hold for this example (and also the one before it).

The last two examples would imply that both upgrading and downgrading are penalised equally by the compiler: If there are two paths which have an equal amount of (up + downgrading), they are considered ambiguous.

In conclusion, I am very confused as to what the rules around the case and keyword ambiguity are and would like clarification

grammarware commented 4 months ago

I have edited the description on Canvas by providing short descriptions per case, hopefully less confusion now.

In general the only error here is line 4 repeating line 2, otherwise we have two parses (1 and 2/4) with one "fix" — yes, 2 and 4 are also one fix because we need to take only one leap of faith with assuming the second to is in fact TO, and the third keyword candidate has no choice in this case. The parses on line 3 and 5 have two "fixes" so if there were indeed only two alternatives — one with one fix and one with two fixes — we could have just ruled out the more "expensive" one. However, here we have ambiguously two expensive ones and ambiguously two cheap ones, so nothing helps.

I was not considering "downgrading" at all, since "upgrading" is basically changing the type of a token to a keyword, and "downgrading" assumes that you have already done so before and need to roll back now. If this is taken into consideration, then we have different costs for these four parses, but there will still be clashes with the same cost, hence the ambiguity still remaining unresolved, so the best way for the compiler is still to give up and report an error to the develope.