s9e / TextFormatter

Text formatting library that supports BBCode, HTML and other markup via plugins. Handles emoticons, censors words, automatically embeds media and more.
MIT License
233 stars 36 forks source link

Dynamically adding rel to URL tag #195

Closed luceos closed 2 years ago

luceos commented 2 years ago

Hi @JoshyPHP

For @Flarum I am looking at a way to set the rel value on the URL tag. Our core package instantiates the formatter with the default:

            $rel = $a->getAttribute('rel');
            $a->setAttribute('rel', "$rel nofollow ugc");

Now I'm trying to extend this logic so that links pointing to the same domain will have no rel (implying "dofollow"). So far I tried appending another attribute to the template like so:

        /** @var Configurator\Items\Template $template */
        $template = $configurator->tags['URL']->template;
        $dom = $template->asDOM();

        /** @var Element $a */
        foreach ($dom->getElementsByTagName('a') as $a) {
            $a->removeAttribute('rel');

            $a
                ->prependXslIf('$BLANK')
                ->appendXslAttribute('target', '_blank');

            $a
                ->prependXslIf('$FOLLOW and $BLANK')
                ->appendXslAttribute('rel', 'noopener');
            $a
                ->prependXslIf('$FOLLOW')
                ->appendXslAttribute('rel', '');

            $a
                ->prependXslIf('not($FOLLOW) or $FOLLOW=0')
                ->appendXslAttribute('rel', 'ugc noopener');
        }

        $dom->saveChanges();

While trying to inject these within the renderer like so:

    public function __invoke(Renderer $renderer, $context, string $xml)
    {
        $xml = Utils::replaceAttributes($xml, 'URL', function ($attributes) {
            $uri = isset($attributes['url'])
                ? new Uri($attributes['url'])
                : null;

            $attributes['FOLLOW'] = $this->url->getHost() === $uri?->getHost() ? 1 : 0;
            $attributes['BLANK'] = $this->url->getHost() !== $uri?->getHost() ? 1 : 0;

            return $attributes;
        });

        return $xml;
    }

The attributes do not seem to cascade into the template. These issues I encountered:

Now my assumption is that the attributes are not the same as arguments in that context. But arguments seem to apply to the whole render call instead of a per-tag base. Can you help me understand the underlying textformatter logic?

PS I also tried using a normalizer, but I think that would be the wrong approach as the changes rely on the context given?

Thank you so much for your time and this excellent package.

Ref: https://github.com/luceos/flarum-ext-dofollow

JoshyPHP commented 2 years ago

Symbols that start with $ are template parameters. They are global and can't be changed during the rendering process. What you're trying to use are attributes, which are set per element in the XML. On that note, while I find it wonderful that all of the XML input, HTML output and the XSL template that transforms one into the other use the DOM, it also means that the term "attribute" can refer to any of those three contexts so let me know if any part is unclear.

If the goal is to have different values for each link, then you'll want to use attributes; Instead of $FOLLOW you will use something like @FOLLOW. Note that regular attributes set by the library are normalized to lower case. It doesn't stop you from managing your own by modifying the XML, but it's worth mentioning.

In your case, I think it would be simpler to forgo most/all of the template logic and decide in your __invoke method what the values for rel and target should be. If you create those attributes in the XML, you can copy them into the HTML output with XSL's xsl:copy-of element. Here's a quick proof of concept:

$configurator = new s9e\TextFormatter\Configurator;
$configurator->Autolink;
use s9e\TextFormatter\Utils;

function __invoke(string $xml)
{
    $xml = Utils::replaceAttributes($xml, 'URL', function ($attributes) {
        $attributes['rel']    = 'rel_goes_here';
        $attributes['target'] = '_blank';

        return $attributes;
    });

    return $xml;
}

/** @var Configurator\Items\Template $template */
$template = $configurator->tags['URL']->template;
$dom = $template->asDOM();

/** @var Element $a */
foreach ($dom->getElementsByTagName('a') as $a) {
    $a->prependXslCopyOf('@target');
    $a->prependXslCopyOf('@rel');
}
$dom->saveChanges();

$dom->formatOutput = true;
echo "Template:\n", $dom->saveXML($dom->firstOf('//a')), "\n\n";

extract($configurator->finalize());

$text = 'http://example.org';
$xml  = $parser->parse($text);
echo "Original XML:\n$xml\n\n";

$html = $renderer->render($xml);
echo "Output:\n$html\n\n";

$xml  = __invoke($xml);
echo "Modified XML:\n$xml\n\n";

$html = $renderer->render($xml);
echo "Output:\n$html\n";
Template:
<a href="{@url}">
  <xsl:copy-of select="@rel"/>
  <xsl:copy-of select="@target"/>
  <xsl:apply-templates/>
</a>

Original XML:
<r><URL url="http://example.org">http://example.org</URL></r>

Output:
<a href="http://example.org">http://example.org</a>

Modified XML:
<r><URL rel="rel_goes_here" target="_blank" url="http://example.org">http://example.org</URL></r>

Output:
<a href="http://example.org" rel="rel_goes_here" target="_blank">http://example.org</a>

Note that when copying an attribute, its value cannot be modified. In theory, I believe that XSLT may allow you to overwrite an attribute but it's a bit dicey, and on top of that the native PHP renderer in the library won't handle it gracefully. That's why if you need a default value, you'll be better off setting it in PHP because it's simpler (less templating logic) and immediately obvious to other maintainers.

Let me know if I missed anything and feel free to tag me in relevant discussions.

luceos commented 2 years ago

Excellent 🤟 This helped render the right values within a minute or two.

Now I just need to introduce this in the best way inside Flarum. I don't think overriding a previously set attribute @ works. Would it make sense for Flarum to move the default nofollow ugc it has in core into a Normalizer? That should enable the code I have now to override that, right?

JoshyPHP commented 2 years ago

If you want to have exactly nofollow ugc (a static value) on all a elements created via markup, then yes it would be better as a Normalizer. If you want to have a dynamic rel value on all a elements but that value is not determined by attributes from the tag itself and does not require processing the XML, then a Normalizer is fine too.

On the other hand, if you want all a elements to have a rel based on its href then it will be simpler to postprocess the HTML directly. On that note, the PHP renderer always outputs attributes in double quotes and always leaves exactly one space (U+0020) before the attribute's name. Because its output is consistent, it should be safe to process as a string; You don't have to load it in a DOM.

JoshyPHP commented 2 years ago

For what it's worth, I was wondering how much code it takes to postprocess the HTML generated by the PHP renderer and I wrote the following. Feel free to modify and reuse as necessary, no attribution required.

$html = '<a href="...">';
$html = preg_replace_callback('(<a(?= )\\K[^>]++)', 'replaceLinkAttributes', $html);
var_dump($html);

function replaceLinkAttributes(array $m): string
{
    $html = $m[0];

    // Match all attribute declarations. $html starts with a space
    preg_match_all('( ([^=]++)="([^"]*+)")', $html, $m);

    // Create a map of current attributes, the values are left HTML-encoded
    $attributes = array_combine($m[1], $m[2]);

    if (!isset($attributes['href']))
    {
        // No href? megamind.jpg
        return $html;
    }

    // Use existing rel values as keys, the array's values don't matter
    $rel = [];
    if (isset($attributes['rel']))
    {
        preg_match_all('(\\S++)', $attributes['rel'], $m);
        $rel += array_flip($m[0]);
    }

    // Add/remove rel entries as needed
    $rel['ugc']      = 0;
    $rel['nofollow'] = 0;

    // Replace the rel attribute before serializing attributes back into a string
    $attributes['rel'] = implode(' ', array_keys($rel));

    $html = '';
    foreach ($attributes as $k => $v)
    {
        $html .= " $k=\"$v\"";
    }

    return $html;
}
JoshyPHP commented 2 years ago

I'm closing this issue. Feel free to tag me in related discussions.