Closed ruudk closed 3 months ago
To me, this should directly trigger E_USER_DEPRECATED
(or use the trigger_deprecation
of the symfony/deprecation-contracts if we already use it as the way to trigger deprecations).
And this should go in the 3.x branch (so that 4.0 can ship with the new behavior instead of having to wait until 5.0)
Thanks for the review. I think it makes sense. I will change this PR to target v3 and make sure we only deprecate and not introduce new functionality.
Then for v4 we can do the other change.
Oke, I updated the PR to target V3. We don't change the behavior. We only deprecate, using trigger_deprecation
, when we detect an unnecessary escape character.
When this is merged, v4 is rebased with v3, I can provide the PR that changes the behavior there.
Please have another look when you have time ð
I refactored the code a bit and added more test cases. I think good to carefully review these test cases. Let me know if you think I can add more cases.
Also updated the docs.
Alright, I think I'm done. I added more documentation related to Octal and Hexadecimal escape sequences.
@fabpot what do you think of this? Anything we should change or add more tests?
I've just run a small performance test via the script below and (maybe as expected) the hit can be significant:
<?php
require_once __DIR__.'/vendor/autoload.php';
$string = str_repeat("æ¥æ¬ã§ã¯ãæ¥ã«ãªããšæ¡ã®è±ãå²ããŸããå€ãã®äººã
ã¯ãå
¬åãå·ã®è¿ãã«éãŸãããè±èŠã楜ãã¿ãŸããæ¡ã®è±ã³ãã颚ã«èãããŸãã§éªã®ããã«èŠããç¬éã¯ããšãŠãçŸããã§ãã", 10);
$loader = new Twig\Loader\ArrayLoader([
'index.twig' => <<<EOF
{{ "$string" }}
EOF
,
]);
$twig = new Twig\Environment($loader);
for ($i = 0; $i < 10000; $i++) {
$twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}
On my machine, ~100ms for the 3.x branch vs ~900ms for this branch.
But I suppose that this kind of long string is not very common. I would expect "hardcoded" strings to be quite short in templates.
Maybe we could optimize the code a bit. First check if the string contains a backslash, then run the method. Otherwise don't do anything?
Maybe we could optimize the code a bit. First check if the string contains a backslash, then run the method. Otherwise don't do anything?
Sure, as we have a perf script now, we can try a few obvious strategies to see their impact. But then again, let's optimize for short strings first and make sure that long strings are not too slow to lex.
I ran the bench on my machine:
// v3 branch
Benchmark 1: php bench.php
Time (mean ± Ï): 110.2 ms ± 3.7 ms [User: 98.5 ms, System: 7.4 ms]
Range (min ⊠max): 108.6 ms ⊠126.9 ms 23 runs
// this branch
Benchmark 1: php bench.php
Time (mean ± Ï): 1.018 s ± 0.010 s [User: 0.991 s, System: 0.009 s]
Range (min ⊠max): 1.007 s ⊠1.032 s 10 runs
When I change the stripcslashes
method:
private function stripcslashes(string $str, string $quoteType): string
{
+ if (!str_contains($str, '\\')) {
+ return $str;
+ }
+ // ...
it becomes this:
// this branch
Benchmark 1: php bench.php
Time (mean ± Ï): 95.7 ms ± 1.1 ms [User: 84.9 ms, System: 7.5 ms]
Range (min ⊠max): 94.5 ms ⊠99.0 ms 29 runs
But obviously, that's because the bench does not contain any escape characters.
So let's change the bench to:
<?php
require_once __DIR__.'/vendor/autoload.php';
$string = str_repeat("æ¥æ¬ã§ã¯ãæ¥ã«ãªããšæ¡ã®è±ãå²ããŸãã\nå€ãã®äººã
ã¯ãå
¬åãå·ã®è¿ãã«éãŸãããè±èŠã楜ãã¿ãŸãã\\næ¡ã®è±ã³ãã颚ã«èãããŸãã§éªã®ããã«èŠããç¬éã¯ããšãŠãçŸããã§ãã", 10);
$loader = new Twig\Loader\ArrayLoader([
'index.twig' => <<<EOF
{{ "$string" }}
EOF
,
]);
$twig = new Twig\Environment($loader);
for ($i = 0; $i < 10000; $i++) {
$twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}
Now we get
// this branch
Benchmark 1: php bench.php
Time (mean ± Ï): 1.026 s ± 0.002 s [User: 0.999 s, System: 0.008 s]
Range (min ⊠max): 1.024 s ⊠1.029 s 10 runs
So the str_contains
check is making it a bit slower when there is a backslash found. But when there is not, it can return the raw string directly, making it even faster than v3.
@fabpot Applied the feedback and also added the str_contains
optimization.
Thank you @ruudk.
Thanks! I will prep the next PR when v4 is synced.
Hi @ruudk ... i may have a version that reduces the runtime difference, using chunks to avoid looping on every char.
It passes the tests, but i'm not sure if this goes in the direction you wanted for it.
So if you want to check / benchmark it:
private function stripcslashes(string $str, string $quoteType): string
{
if (!str_contains($str, '\\')) {
return $str;
}
$result = '';
$length = strlen($str);
$specialChars = [
'f' => "\f",
'n' => "\n",
'r' => "\r",
't' => "\t",
'v' => "\v",
];
$i = 0;
while ($i < $length) {
if (false === $pos = strpos($str, '\\', $i)) {
$result .= substr($str, $i);
break;
}
$result .= substr($str, $i, $pos - $i);
$i = $pos + 1;
if ($i >= $length) {
$result .= '\\';
break;
}
$nextChar = $str[$i];
if (isset($specialChars[$nextChar])) {
$result .= $specialChars[$nextChar];
} elseif ($nextChar === '\\' || $nextChar === $quoteType) {
$result .= $nextChar;
} elseif ($nextChar === '#' && $i + 1 < $length && $str[$i + 1] === '{') {
$result .= '#{';
$i++;
} elseif ($nextChar === 'x' && $i + 1 < $length && ctype_xdigit($str[$i + 1])) {
$hex = $str[++$i];
if ($i + 1 < $length && ctype_xdigit($str[$i + 1])) {
$hex .= $str[++$i];
}
$result .= chr(hexdec($hex));
} elseif (ctype_digit($nextChar) && $nextChar < '8') {
$octal = $nextChar;
while ($i + 1 < $length && ctype_digit($str[$i + 1]) && $str[$i + 1] < '8' && strlen($octal) < 3) {
$octal .= $str[++$i];
}
$result .= chr(octdec($octal));
} else {
trigger_deprecation('twig/twig', '3.12', sprintf('Character "%s" at position %d does not need to be escaped anymore.', $nextChar, $i + 1));
$result .= $nextChar;
}
$i++;
}
return $result;
}
@smnandre smart! Will check it out today
Created the PR to optimize this further: https://github.com/twigphp/Twig/pull/4196
This is a first attempt at solving #4123 and #2712.
Currently it writes the deprecations to an array in the Lexer. This is probably not the way to do it. Should we directly trigger
E_USER_DEPRECATED
errors?/cc @stof @fabpot Let me know what you think ð