jbboehr / php-mustache

Mustache PHP Extension
MIT License
56 stars 22 forks source link

Lambda to return text instead of parsable template syntax #68

Open Krinkle opened 3 months ago

Krinkle commented 3 months ago

I spent a good amount of time today figuring out why a certain code path works fine during development with https://github.com/bobthecow/mustache.php, but crashes in a container for production, where we use php-mustache.

The repro reduces down to the following:

$templateContent = '
<section>
    {{#lines}}
    <code>{{{html}}}</code>
    {{/lines}}
</section>
';

$text = trim( <<<TEXT
yle={{ // eslint-disable-line react/prop-types\n    padding: \"1em\",\n    \"color\": \"#aaa\"\n  }}> <i>Could not r
TEXT );
$html = static function () use ( $text ) {
    return $text;
};
$data = [
    'lines' => [
        [
            'html' => $html,
        ],
    ],
];
$partials = [];

if ( class_exists( Mustache::class ) ) {
    // Prefer native php-mustache (PECL) when available
    $mustache = new Mustache();
    return $mustache->render( $templateContent, $data, $partials );
} else {
    // Fallback to composer bobthecow/mustache.php for local dev server
    $mustache = new Mustache_Engine( [
        'entity_flags' => ENT_QUOTES,
        'partials' => $partials,
    ] );
    return $mustache->render( $templateContent, $data );
}
Fatal error: Uncaught MustacheException: Reached bottom of stack
in /Users/krinkle/Development/labs-codesearch/frontend/index.php:53
Stack trace:
#0 /Users/krinkle/Development/labs-codesearch/frontend/index.php(53): Mustache->render('\n<section>\n\t{{#...', Array, Array) 

Background

The above was reduced down from https://github.com/wikimedia/labs-codesearch . The live example works at the moment, because it's rendering the results client-side in JavaScript. I'm working on moving this to the backend. In doing so, 99% of search queries rendered without issue and render identically in pixel-perfect manner. But for one rare query, with very large and numerous results, it would crash. I mistakenly attributed this to result size given that all shorter results worked fined, but it turned out the larger result size happen to contain the above unusual text to display, which is rather Mustache-like.

The function in question normally performs a regular expression, and highlights portions that match your search query and then HTML-escapes it. https://gerrit.wikimedia.org/g/labs/codesearch/+/refs/changes/48/1056248/5/frontend/src/Model.php#394

When doing this eagerly, and setting the string in the Mustache data, it all works fine. But when doing the computation and string allocations lazily, it started failing for this one unusual input.

I failed to understand that this error related to something in with "my" template stack, because I had already ruled out all partials or and any unbalanced sections in "my" code. But it turns out there was some Mustache-like syntax in one of the results.

Question: Is there a way to disable parsing/interpretation of the lambda return value as anything other than a data value to be displayed as {{x}} or {{{x}}} accordingly? This would be great for interoperability.

Krinkle commented 3 months ago

In reading up at https://github.com/bobthecow/mustache.php/issues/415, I realized that https://mustache.github.io/mustache.5.html specifies the current behaviour of php-mustache. See also https://github.com/bobthecow/mustache.php/pull/402.

I believe a workaround would be to return an array, except that doesn't work in mustache.php (yet).

jbboehr commented 3 months ago

Would prepending a delimiter change to the lambda return value be a decent workaround?

This is an unfortunate behavior of the spec, luckily handlebars didn't make the same mistake, IIRC.

We could maybe add a wrapper object to disable parsing the rvalue, kind of like SafeString: https://github.com/jbboehr/php-handlebars/blob/d909c634a0af1b9ae80415fc3777bb4d2d056fc0/handlebars.stub.php#L127

Krinkle commented 3 months ago

I'll try the toString object as local workaround for now first. I'd rather not use delimiters as that seems a bit more risky when displaying user-generated content. Thanks!

Krinkle commented 3 months ago

Hm.. so while briefly mentioned in https://github.com/jbboehr/php-mustache/issues/3, it seems objects with __toString are not currently supported in php-mustache. It silently renders the empty string instead. It does work in mustache.php at the moment.

<?php

class LazyString {
    public function __construct(
        private $value
    ) {}
    public function __toString() {
        return 'hello <em>there</em> world';
    }
}

$partials = [];
$templateContent = '
<code>{{{html}}}</code>
<ul>
    {{#lines}}
    <li><code>{{{html}}}</code></li>
    {{/lines}}
</ul>
';

$data = [
    'html' => new LazyString(),
    'lines' => [
        [
            'html' => new LazyString(),
        ],
    ],
];

// php-mustache
$mustache = new Mustache();
print $mustache->render( $templateContent, $data, $partials );

// mustache.php
// $mustache = new Mustache_Engine( [
//  'entity_flags' => ENT_QUOTES,
//  'partials' => $partials,
// ] );
// print $mustache->render( $templateContent, $data );
<code></code>
<ul>
    <li><code></code></li>
</ul>

I did also try it with a method instead, but the result of that is similarly evaluated as Mustache template syntax rather than a value string like the same string directly in $data would be.

class LazyString {
    public function __construct(
        private $value
    ) {}
    public function getHtml() {
        return $this->value;
    }
}

$templateContent = '
<code>{{{html.getHtml}}}</code>
<ul>
    {{#lines}}
    <li><code>{{{html.getHtml}}}</code></li>
    {{/lines}}
</ul>
';
jbboehr commented 2 months ago

Did you manage to find a workaround?

What I meant was we could add a new class to this extension (would have to polyfill it for mustache.php though), such that when a lambda returns it, it disables the mustache execution on it.

We could also add a flag to disable the behavior altogether or something.

Krinkle commented 1 month ago

@jbboehr My workaround is:

        static $delimiterPragma = "{{=`\"'\x7f\xca \xca\x7f'\"`=}}";
        return $delimiterPragma . strtr( $value, [ "\xca" => '' ] );

Details at https://gerrit.wikimedia.org/r/c/labs/codesearch/+/1062418/2/frontend/src/Model.php#441

Which I believe is robust in in dealing with user-generated URL query strings, HTML, XML, Markdown, JSON, and hopefully anything that can be transferred over HTTP or read from a file. I still don't like it, but once I pick this side project back up, I probably would deploy with this. A safe wrapper class would be great, yeah. And yeah, it'd be great if we can coordinate it with mustache.php for interoperability.