Closed Felienne closed 2 years ago
Thanks for suggesting this @OnnoEbbens! I think the real issue here is that ask/vraag should not be highlighted in that case!
In "pure English" this is also an issue:
Assigning our syntax highlighting king @thjazi39!
This is an interesting problem! Indeed, the new syntax highlighting system doesn't take into account the context, and recognizes the keywords independently of what is on the line (except at level 1,2 and 3, which are only based on the context, and which are treated separately)
This has the advantage of being sure that all the keywords that will be interpreted will be colored, which avoids a lot of errors that are now corrected (at random
which is not recognized in some cases, etc....)
The disadvantage (and normally the only one) of this new system is that it can color "too many" keywords, as for example in the following example, at level 4:
(we see here that the keywords ask
and to
are colored, while it is a string that is assigned. This is a typical case of over-coloring)
The question I have now is, Is this over-coloring problem serious? Yes, over-coloring means that the syntactic coloring will not quite color to what Hedy is going to interpret, but on the other hand, it will mean to the child that he is using a key word in the language, and that, while this will not necessarily be a problem, it is a dangerous behavior.
Moreover, Python has the same problem of over-coloring:
(here we use print as a variable, but the coloring still considers it as a keyword, as shown by the comparison with var
and var2
variables)
The question I have now is, Is this over-coloring problem serious? Yes, over-coloring means that the syntactic coloring will not quite color to what Hedy is going to interpret, but on the other hand, it will mean to the child that he is using a key word in the language, and that, while this will not necessarily be a problem, it is a dangerous behavior.
I would say, yes, it is very undesired because kids are still learning and the colors help. So if we can get this fixed, please do!
Concerning problem solving, since Ace (the syntax coloring system) is based on an automaton system, with different states, there is surely a way to do it better, typically, at random
can only be used in the case where we are looking for a value (for example, following an is
or a print
) and it should be possible to use the state system (or token, to use the Ace vocabulary)
However, this might take a lot of time, because it requires a thorough analysis of the grammar, because the contexts are quite varied, but I'm interested in doing it, and I'll try once I've managed to set up tests for syntax highlighting.
Also, Hedy's grammar will still have potential changes, for example the issue #1938
And if syntax highlighting uses an automaton system, changing the grammar to fix these bugs will require redoing the syntax highlighting automaton system of the syntax highlighting, which is not ideal for maintenance (redoing the automaton, modifying the tests, etc.). Whereas by freeing ourselves from the context, we are sure that there will be no unrecognized keywords
Then, apart from these technical considerations, I think it's feasible, and it's done for the first 3 levels. But if it's feasible for the "simple" levels, for the "complicated" levels, I think there will always be syntax coloring bugs.
I don't want to say anything stupid, but this must be due to the fact that the expressiveness of finite automata is strictly smaller than the expressiveness of grammars. As soon as Hedy reaches a sufficiently complicated level, then it is impossible not to have syntax highlighting bugs (using a finite automata based system) I am having research done to see what is possible and what is not.
Then, apart from these technical considerations, I think it's feasible, and it's done for the first 3 levels.
True, it is possible
But if it's feasible for the "simple" levels, for the "complicated" levels, I think there will always be syntax coloring bugs.
Yeah, that is inevitable, but (esp for the lower levels) we want to be as close to perfect as possible!
I don't want to say anything stupid, but this must be due to the fact that the expressiveness of finite automata is strictly smaller than the expressiveness of grammars. As soon as Hedy reaches a sufficiently complicated level, then it is impossible not to have syntax highlighting bugs (using a finite automata based system) I am having research done to see what is possible and what is not.
No this is not stupid, this is correct! Grammars can be context sensitive, regexes cannot!
While waiting for me to do a more thorough analysis of the grammar, we can put perhaps used a synonym, in order to temporarily circumvent the problem.
Maybe kwestie
will work
While waiting for me to do a more thorough analysis of the grammar, we can put perhaps used a synonym, in order to temporarily circumvent the problem. Maybe
kwestie
will work
Good idea! kwestie
is not really a word we use, but antwoord
could be ok?
I'll make a branch with the synonym, and see if there are other similar cases that I could work around until I get a more powerful parser
I made this little piece of code that illustrates the over-coloring (and that just at level 4 with only English keywords) Even if it's an extreme case, that children will never do by themselves, it will allow to test the syntax highlighting
ask is random
sleep is ask ask
is is ask , at random , print
ask is is at random
print ask is at random
Yeah, I see your point and I agree this is not what we want!
I would propose we try to get the syntax highlighter to be a bit more context aware! So only highlight keywords if there are more or less in the right place, is that possible in the new paradigm?
If it is urgent that it be replaced, this patch #2301 can work
Afterwards, if I have a little more time, I could check if there are similar cases in other places, or in other languages.
And concerning the clean repair of the bug, taking more into account the context, it will take me more time, because if it is not done properly, there will be under-coloring bugs that will appear.
Yeah if it is annoying to Onno, let's hot patch it now and fix the syntac highlighting when we have found out how to
Hi @thjazi39!
Any updates on this (I know you had some exams this week so maybe you did not pursue it further?)
Concerning the resolution of this bug: as soon as my exams will be finished, I will check if there are other cases similar to this one in other languages (I already started a script) that should go quickly and avoid the inconvenience
Then, for the resolution of this bug in a cleaner way, I see 2 solutions:
we "discourage/prohibit" the use of some words for variables. It doesn't seem unreasonable to me to have forbidden words, it's quite common in a language, and I think that children will be able to understand it : "Be careful, you use the name of a command to do something else, Hedy might misunderstand what you want to do". But it means that you have to detect that, and I don't know if it's easily done.
I would change the syntax highlighting to be a little more keyword based. From what I understand, this solution will be the best. But it won't be so easy (in particular, I'm afraid that if it's not done properly, it will introduce sub-coloring bugs). Also, I think I'll tackle this once I've set up tests for syntax highlighting. That's my next task on my todo list, and it will allow me to have barriers, so that taking context into account doesn't introduce bugs.
I think this issue is a bit complicated and for me it is very much related to allowing multiple keyword languages and the discussion here #2080.
I like the first solution of discouraging keywords as variables. However if we do this I also think we should fix the keyword language and not allow keywords in a second language. Otherwise people are discouraged to use certain keywords that they haven't seen before.
Since we allow multiple keyword languages I get questions regularly from students why some words appear in red while they haven't learned this commands. Although easily explained it is counter intuitive for them. On the other hand I have seen students use the switch NL/EN to see the translation of commands which might have helped them to understand the code better. So there are some advantages and disadvantages for fixing the keyword language and I can understand both arguments.
Personally I would like to be able to fix the keyword language. Or maybe only fix the keyword language if it is set to English. If the keyword language is set to Dutch I can understand if you allow English as a second language because in the end you will have to use English.
Thanks for the extensive explanation @OnnoEbbens!
On the other hand I have seen students use the switch NL/EN to see the translation of commands which might have helped them to understand the code better. So there are some advantages and disadvantages for fixing the keyword language and I can understand both arguments.
That is a cool use case, I had not thought of that!
Personally I would like to be able to fix the keyword language. O
I am not sure though what you mean here by "fix", do you mean fix as in "vastzetten" (pin)?
Or maybe only fix the keyword language if it is set to English.
Specifically this I don't understand, because in English the language is already fixed in the sense that it will only highlight English keywords.
We can also chat tomorrow of course!
Yeah sorry for the confusion, I see its confusing. With 'fix' I meant to only allow this keyword language and no other keyword languages. So the last part becomes:
Personally I would like to be able to only use one keyword language. Or maybe only use one keyword language if the keyword language is set to English. If the keyword language is set to Dutch I can understand if you allow English and Dutch keywords because in the end you will have to use English.
Personally I would like to be able to only use one keyword language. Or maybe only use one keyword language if the keyword language is set to English. If the keyword language is set to Dutch I can understand if you allow English and Dutch keywords because in the end you will have to use English.
I think this is exactly the way we have it now! In English you can only use (and only get highlighting for) English. In other languages, you can use that language (f.e. Dutch) plus English. But maybe I misunderstand your idea?
Okay, I tried to explain it more clearly but maybe its better if we discuss this tomorrow :)
I propose to only allow English keywords in this case.
I propose to keep it that way.
Ah now I get it, this makes total sense!! This would not be very hard to configure that way (tagging @TiBiBa so he can make an issue and a fix)
I don't get it (yet). So we want to enable teachers to enforce showing only the English syntax highlighter? And if so, should this be aligned with the already implemented "hide keyword language switcher" setting?
No, I think this is separate even from teachers. If kids enable English keywords, they should only be able to use English keywords (and highlighting) So even if the site is set to Dutch (or another lang) if you set your keywords to English, you get the same experience as "English English"
So we should also hide the keyword switcher when English is chosen as the keyword language? As currently the keyword switcher (as well as the syntax highlighter) are based on the web language, nog the keyword language.
Ah yes that is a consequence too
Fixed in #2357!
Okay, I tried to explain it more clearly but maybe its better if we discuss this tomorrow :)
- With these settings you can use both English and Dutch keywords:
- language: Dutch
- keyword language: English
I propose to only allow English keywords in this case.
- With these settings you can use both English and Dutch keywords:
- language: Dutch
- keyword language: Dutch
I propose to keep it that way.
If I understand correctly, we have :
Or :
Or :
But not :
This means that in the syntax highlighting, the syntaxLang-XX.ts files will have to be modified, to take into account only the XX language
I think #2357 takes care of that by only highlighting differently when another keyword lang is chosen
I think #2357 takes care of that by only highlighting differently when another keyword lang is chosen
Indeed! That's exactly the approach we take in #2357.
I think #2357 takes care of that by only highlighting differently when another keyword lang is chosen
what I meant is explained here: #2393
Yes this is a change that has been already taken care of.
In the last days some students noticed something confusing. In case you use Dutch with English commands and you use a Dutch word in your code that is coincidentally the translation of an English command, the word is highlighted in red as if it were a command. This happens more frequently than I initially thought. As an example, see the fortune adventure in level 4 (https://www.hedycode.com/hedy/4/#fortune) with the code
vraag is ask 'Wat wil je weten?'
. Would it be doable to implement an English-command-only option where users cannot switch to Dutch and no highlighting of Dutch translations of English commands?Originally posted by @OnnoEbbens in https://github.com/Felienne/hedy/discussions/2080#discussioncomment-2415140