hedyorg / hedy

Hedy is a gradual programming language to teach children programming. Gradual languages use different language levels, where each level adds new concepts and syntactic complexity. At the end of the Hedy level sequence, kids master a subset of syntactically valid Python.
https://www.hedy.org
European Union Public License 1.2
1.31k stars 287 forks source link

Don't highlight ask/vraag at the beginning of the line in level over 3 #2279

Closed Felienne closed 2 years ago

Felienne commented 2 years ago

In the last days some students noticed something confusing. In case you use Dutch with English commands and you use a Dutch word in your code that is coincidentally the translation of an English command, the word is highlighted in red as if it were a command. This happens more frequently than I initially thought. As an example, see the fortune adventure in level 4 (https://www.hedycode.com/hedy/4/#fortune) with the code vraag is ask 'Wat wil je weten?'. Would it be doable to implement an English-command-only option where users cannot switch to Dutch and no highlighting of Dutch translations of English commands?

Originally posted by @OnnoEbbens in https://github.com/Felienne/hedy/discussions/2080#discussioncomment-2415140

Felienne commented 2 years ago

Thanks for suggesting this @OnnoEbbens! I think the real issue here is that ask/vraag should not be highlighted in that case!

In "pure English" this is also an issue:

image

Assigning our syntax highlighting king @thjazi39!

thjazi39 commented 2 years ago

This is an interesting problem! Indeed, the new syntax highlighting system doesn't take into account the context, and recognizes the keywords independently of what is on the line (except at level 1,2 and 3, which are only based on the context, and which are treated separately)

This has the advantage of being sure that all the keywords that will be interpreted will be colored, which avoids a lot of errors that are now corrected (at random which is not recognized in some cases, etc....)

The disadvantage (and normally the only one) of this new system is that it can color "too many" keywords, as for example in the following example, at level 4: image (we see here that the keywords ask and to are colored, while it is a string that is assigned. This is a typical case of over-coloring)

The question I have now is, Is this over-coloring problem serious? Yes, over-coloring means that the syntactic coloring will not quite color to what Hedy is going to interpret, but on the other hand, it will mean to the child that he is using a key word in the language, and that, while this will not necessarily be a problem, it is a dangerous behavior.

Moreover, Python has the same problem of over-coloring: image (here we use print as a variable, but the coloring still considers it as a keyword, as shown by the comparison with var and var2 variables)

Felienne commented 2 years ago

The question I have now is, Is this over-coloring problem serious? Yes, over-coloring means that the syntactic coloring will not quite color to what Hedy is going to interpret, but on the other hand, it will mean to the child that he is using a key word in the language, and that, while this will not necessarily be a problem, it is a dangerous behavior.

I would say, yes, it is very undesired because kids are still learning and the colors help. So if we can get this fixed, please do!

thjazi39 commented 2 years ago

Concerning problem solving, since Ace (the syntax coloring system) is based on an automaton system, with different states, there is surely a way to do it better, typically, at random can only be used in the case where we are looking for a value (for example, following an is or a print) and it should be possible to use the state system (or token, to use the Ace vocabulary)

However, this might take a lot of time, because it requires a thorough analysis of the grammar, because the contexts are quite varied, but I'm interested in doing it, and I'll try once I've managed to set up tests for syntax highlighting.

Also, Hedy's grammar will still have potential changes, for example the issue #1938

And if syntax highlighting uses an automaton system, changing the grammar to fix these bugs will require redoing the syntax highlighting automaton system of the syntax highlighting, which is not ideal for maintenance (redoing the automaton, modifying the tests, etc.). Whereas by freeing ourselves from the context, we are sure that there will be no unrecognized keywords

thjazi39 commented 2 years ago

Then, apart from these technical considerations, I think it's feasible, and it's done for the first 3 levels. But if it's feasible for the "simple" levels, for the "complicated" levels, I think there will always be syntax coloring bugs.

I don't want to say anything stupid, but this must be due to the fact that the expressiveness of finite automata is strictly smaller than the expressiveness of grammars. As soon as Hedy reaches a sufficiently complicated level, then it is impossible not to have syntax highlighting bugs (using a finite automata based system) I am having research done to see what is possible and what is not.

Felienne commented 2 years ago

Then, apart from these technical considerations, I think it's feasible, and it's done for the first 3 levels.

True, it is possible

But if it's feasible for the "simple" levels, for the "complicated" levels, I think there will always be syntax coloring bugs.

Yeah, that is inevitable, but (esp for the lower levels) we want to be as close to perfect as possible!

I don't want to say anything stupid, but this must be due to the fact that the expressiveness of finite automata is strictly smaller than the expressiveness of grammars. As soon as Hedy reaches a sufficiently complicated level, then it is impossible not to have syntax highlighting bugs (using a finite automata based system) I am having research done to see what is possible and what is not.

No this is not stupid, this is correct! Grammars can be context sensitive, regexes cannot!

thjazi39 commented 2 years ago

While waiting for me to do a more thorough analysis of the grammar, we can put perhaps used a synonym, in order to temporarily circumvent the problem. Maybe kwestie will work

Felienne commented 2 years ago

While waiting for me to do a more thorough analysis of the grammar, we can put perhaps used a synonym, in order to temporarily circumvent the problem. Maybe kwestie will work

Good idea! kwestie is not really a word we use, but antwoord could be ok?

thjazi39 commented 2 years ago

I'll make a branch with the synonym, and see if there are other similar cases that I could work around until I get a more powerful parser

I made this little piece of code that illustrates the over-coloring (and that just at level 4 with only English keywords) Even if it's an extreme case, that children will never do by themselves, it will allow to test the syntax highlighting image

ask is random
sleep is ask ask
is is ask , at random , print
ask is is at random
print ask is at random
Felienne commented 2 years ago

Yeah, I see your point and I agree this is not what we want!

I would propose we try to get the syntax highlighter to be a bit more context aware! So only highlight keywords if there are more or less in the right place, is that possible in the new paradigm?

thjazi39 commented 2 years ago

If it is urgent that it be replaced, this patch #2301 can work

Afterwards, if I have a little more time, I could check if there are similar cases in other places, or in other languages.

And concerning the clean repair of the bug, taking more into account the context, it will take me more time, because if it is not done properly, there will be under-coloring bugs that will appear.

Felienne commented 2 years ago

Yeah if it is annoying to Onno, let's hot patch it now and fix the syntac highlighting when we have found out how to

Felienne commented 2 years ago

Hi @thjazi39!

Any updates on this (I know you had some exams this week so maybe you did not pursue it further?)

thjazi39 commented 2 years ago

Concerning the resolution of this bug: as soon as my exams will be finished, I will check if there are other cases similar to this one in other languages (I already started a script) that should go quickly and avoid the inconvenience

Then, for the resolution of this bug in a cleaner way, I see 2 solutions:

OnnoEbbens commented 2 years ago

I think this issue is a bit complicated and for me it is very much related to allowing multiple keyword languages and the discussion here #2080.

I like the first solution of discouraging keywords as variables. However if we do this I also think we should fix the keyword language and not allow keywords in a second language. Otherwise people are discouraged to use certain keywords that they haven't seen before.

Since we allow multiple keyword languages I get questions regularly from students why some words appear in red while they haven't learned this commands. Although easily explained it is counter intuitive for them. On the other hand I have seen students use the switch NL/EN to see the translation of commands which might have helped them to understand the code better. So there are some advantages and disadvantages for fixing the keyword language and I can understand both arguments.

Personally I would like to be able to fix the keyword language. Or maybe only fix the keyword language if it is set to English. If the keyword language is set to Dutch I can understand if you allow English as a second language because in the end you will have to use English.

Felienne commented 2 years ago

Thanks for the extensive explanation @OnnoEbbens!

On the other hand I have seen students use the switch NL/EN to see the translation of commands which might have helped them to understand the code better. So there are some advantages and disadvantages for fixing the keyword language and I can understand both arguments.

That is a cool use case, I had not thought of that!

Personally I would like to be able to fix the keyword language. O

I am not sure though what you mean here by "fix", do you mean fix as in "vastzetten" (pin)?

Or maybe only fix the keyword language if it is set to English.

Specifically this I don't understand, because in English the language is already fixed in the sense that it will only highlight English keywords.

We can also chat tomorrow of course!

OnnoEbbens commented 2 years ago

Yeah sorry for the confusion, I see its confusing. With 'fix' I meant to only allow this keyword language and no other keyword languages. So the last part becomes:

Personally I would like to be able to only use one keyword language. Or maybe only use one keyword language if the keyword language is set to English. If the keyword language is set to Dutch I can understand if you allow English and Dutch keywords because in the end you will have to use English.

Felienne commented 2 years ago

Personally I would like to be able to only use one keyword language. Or maybe only use one keyword language if the keyword language is set to English. If the keyword language is set to Dutch I can understand if you allow English and Dutch keywords because in the end you will have to use English.

I think this is exactly the way we have it now! In English you can only use (and only get highlighting for) English. In other languages, you can use that language (f.e. Dutch) plus English. But maybe I misunderstand your idea?

OnnoEbbens commented 2 years ago

Okay, I tried to explain it more clearly but maybe its better if we discuss this tomorrow :)

  1. With these settings you can use both English and Dutch keywords:
    • language: Dutch
    • keyword language: English

I propose to only allow English keywords in this case.

  1. With these settings you can use both English and Dutch keywords:
    • language: Dutch
    • keyword language: Dutch

I propose to keep it that way.

Felienne commented 2 years ago

Ah now I get it, this makes total sense!! This would not be very hard to configure that way (tagging @TiBiBa so he can make an issue and a fix)

TiBiBa commented 2 years ago

I don't get it (yet). So we want to enable teachers to enforce showing only the English syntax highlighter? And if so, should this be aligned with the already implemented "hide keyword language switcher" setting?

Felienne commented 2 years ago

No, I think this is separate even from teachers. If kids enable English keywords, they should only be able to use English keywords (and highlighting) So even if the site is set to Dutch (or another lang) if you set your keywords to English, you get the same experience as "English English"

TiBiBa commented 2 years ago

So we should also hide the keyword switcher when English is chosen as the keyword language? As currently the keyword switcher (as well as the syntax highlighter) are based on the web language, nog the keyword language.

Felienne commented 2 years ago

Ah yes that is a consequence too

TiBiBa commented 2 years ago

Fixed in #2357!

Felienne commented 2 years ago

2357 helps but still the issue in "pure English" remains, @thjazi39 will have to fix the highlighter a bit more

thjazi39 commented 2 years ago

Okay, I tried to explain it more clearly but maybe its better if we discuss this tomorrow :)

  1. With these settings you can use both English and Dutch keywords:
  • language: Dutch
  • keyword language: English

I propose to only allow English keywords in this case.

  1. With these settings you can use both English and Dutch keywords:
  • language: Dutch
  • keyword language: Dutch

I propose to keep it that way.

If I understand correctly, we have :

Or :

Or :

But not :

This means that in the syntax highlighting, the syntaxLang-XX.ts files will have to be modified, to take into account only the XX language

Felienne commented 2 years ago

I think #2357 takes care of that by only highlighting differently when another keyword lang is chosen

TiBiBa commented 2 years ago

I think #2357 takes care of that by only highlighting differently when another keyword lang is chosen

Indeed! That's exactly the approach we take in #2357.

thjazi39 commented 2 years ago

I think #2357 takes care of that by only highlighting differently when another keyword lang is chosen

what I meant is explained here: #2393

Felienne commented 2 years ago

Yes this is a change that has been already taken care of.