Open samuelgfeller opened 1 week ago
If this is correct PHP then I don't think we properly fully support all types of PHP comments or all nuances of strings. Our grammar specifically says that single quoted strings may NOT contain any type of escape characters... where as \'
above appears to be an escape...
Thoughts?
Hmm thats a good question. But in the bug I tried to demonstrate the escape char is not relevant. I've just used it in the example so it stays valid php but without it or with normal quotes around that string, the same issue appears. Also if there is no escape at all, just php code and then one comment in a code block that contains a "don't" or "can't" all highlight breaks aftwards. Have you tried it in the demo?
Our grammar specifically says that single quoted strings may NOT contain any type of escape characters
That's interesting. To my knowledge a lot of programming/script languages (php, python, js, ruby, java, perl) support escape chars inside a single quoted string.
I've asked chatgpt about it and here is the answer on what can and cannot be escaped in single quote php strings:
In PHP, single-quoted strings support very limited escaping. Within single quotes, only two characters can be escaped with a backslash:
- Single Quote (\') - To include a single quote inside a single-quoted string.
- Backslash (\) - To include an actual backslash character.
Other escape sequences, like \n for newlines or \t for tabs, will not be interpreted in single-quoted strings in PHP. They will simply be treated as literal characters.
the escape char is not relevant. ... Also if there is no escape
It is very relevant - if we understood the escape we'd know that that quote isn't a terminator at all. The issue here is two fold:
These combined create the issues you are seeing. To fix this both php
and php-template
need to fully comprehend the rules for both comments and strings.
php
will need to support all comment types (it may already)php
will need to support strings fully (including limited escapes within '
strings)php-template
will need to understand all comment types (and skip
them)php-template
will need to understand all strings (and skip
them)The way sublanguage processing works (language within language) is almost like "two phase"... the first parsing pass has to consume the entire block as a single unit - and then pass it off to be reparsed by the second grammar. That means a lot of rules that half way understand the content (skipping strings, comments, etc), and just skip it - until we find the closing pattern. We have to skip strings and comments because a ?>
hidden within a string or comment wouldn't be a termination at all - so to prevent those false positives we have to match strings and comments in the "outer" grammar...
Have you tried it in the demo?
I use the developer tool, it's much more powerful than the demo. :-)
If I whip up a fix are you comfortable building from source and validating - perhaps throwing addition test cases at it trying to break it?
If you explain to me each step in detail I'd gladly do that! See me as a beginner.
Might want to start here:
https://highlightjs.readthedocs.io/en/latest/building-testing.html
Once you've built a node
build you can run markup tests (and write a few)... once you build a browser
build you can use the developer tool for testing. I'd suggest reading all the docs and looking at other grammars (and built in modes) for examples of how things work.
I'd start with php
, making sure it highlights all comments, strings, and escapes correctly... after that it should just be coping those key rules into php-template and making then scopeless/skip rules. Do one small change at a time, test, repeat.
For example you might first want to start with the inherited single quoted string, where it removes all the escapes by niling contains
- and then add the two needed escapes instead, etc...
Okay I built it locally and experimented with the developer tool.
I tried to write test cases that cover the missing highlighting of the escaped char (php) and single quote issue in the comments and escaped in a string (php-template).
In the phpstorm screenshot I show what would be correct.
The language php doesn't have an issue with ' in the comments. It skips it correctly.
?>
as part of the comment on the first line.<?php // The closing php tag on the same line is not part of this comment ?>
<?php
// I don't know if you want to highlight the escaped single quote and backslash differently
// as they are escaped and not part of the literal string.
// PHPStorm does highlight it differently, which I find very pleasing, but GitHub does not.
echo 'escaped \' and \n and \\';
?>
<?php
// Only variables are highlighted differently. Would be awesome to be able to distinguish escaped chars.
echo "\n is and \t are interpreted and therefore should not have the same color as \"string\".
The same goes for the escaped double quotes and $variables";
?>
Highlight.js
PHPStorm
I found out it's not only the single quote but also the double quote that breaks highlighting if escaped in a string or in a comment.
The quotes issue vanishes with the same snippet if "php" is chosen as language.
<?= "--> has no impact: ' " ?>
<?= '--> also has no impact: " ' ?>
<?= 'breaks it \'' ?>
<?= "breaks it \"" ?>
<?php // Escaped char \\ and \". \n \t not interpreted.
echo '\n \t \\ \"'; ?>
<?php // Escaped chars \n \t \\ \' all interpreted.
echo "\n \t \\"; ?>
<!-- HTML comment -->
<span spellcheck="false"><?= 'php string' ?></span>
Highlight.js PHPStorm Everything correct.
<?= "--> has no impact: ' " ?>
<?= '--> also has no impact: " ' ?>
<?php // breaks it ' ?>
<?php // breaks it " ?>
<!-- HTML comment -->
<span spellcheck="false"><?= 'php string' ?></span>
Highlight.js
PHPStorm
So seems for PHP we need to modify the line comment rule to also recognize ?>
as a comment terminator. I'd think we'd want a look-ahead regex there - since we just want to recognize the terminator, but we don't want to consume it as the end of the comment.
And of course we need to add contains
modes within string to recognize the escapes... you should be able to find examples from other languages or even the default modes.
Can you implement this fix? I don't have the capacity right now. I'll gladly test and try to break it afterwards.
I found something else for the php
language that I don't quite understand. The following code block is wrongly highlighted after the use ($extra) {
$extra = 'Some optional additional value for the closure';
$validator->add('title', 'custom', [
'rule' => function ($value, $context) use ($extra) {
// Custom logic that returns true if the validation passes and
// false if the error message below should be shown
},
'message' => 'The title is not valid'
]);
You really need to fire up the developer tool so you can see the spans visually.
Would that have helped you if I included the visual html spans in the screenshot or as text below? I can still do it but I also provided the code.
I was thinking it might help YOU. :)
Describe the issue An apostrophe
'
that exists on its own without being closed seems to break highlighting of everything that comes after it even if it's in a comment or escaped but only with the languagephp-template
. Withphp
it works fine as it should.Funnily if anywhere later this
'
is "closed" it works again.Which language seems to have the issue?
php-template
Are you using
highlight
orhighlightAuto
? Highlight, not auto.Sample Code to Reproduce Code: See expected behaviour
Not highlighted:
Correctly highlighted:
Expected behavior Github highlights the php bits:
I expect it to highlight the code correctly even if not every ' is closed. In a comment for instance, English words allow for shortcuts with ' for instance "don't" or "can't".
Additional observations
Its breaks only if the apostrophe is in PHP code (not HTML) and if it's not in a code block comment / such as this, that won't cause any troubles /. It's always everything that comes after it that is broken not before so if half the file doesn't contain a
'
and then suddenly a php part with a'
either in a// comment '
or in a string'string \' with an apostrophe'
the bottom half of the file is not highlighted.