s9e / TextFormatter

Text formatting library that supports BBCode, HTML and other markup via plugins. Handles emoticons, censors words, automatically embeds media and more.
MIT License
233 stars 36 forks source link

[MediaEmbed] urlencode captured value #221

Closed rxu closed 1 year ago

rxu commented 1 year ago

Some links are in IDN format so contain non-latin chars, for example https://music.apple.com/ru/playlist/песни-о-любви-на-русском-главное/pl.764df3a81ac74382b4d6a8ad3b32e850. When it's got captured with a regex like !//music.apple.com/(?'country'[a-z]{2})/playlist/(?'plname'[\\pL\\d\\-%A-F]+)/(?'plid'pl\\.[\\w\\d]+)!, then plname would contain песни-о-любви-на-русском-главное. But the embed link should be urlencoded: //embed.music.apple.com/ru/playlist/%D0%BF%D0%BE%D0%BF-%D0%BA%D0%B0%D1%80%D0%B0%D0%BC%D0%B5%D0%BB%D1%8C/pl.e33abdab5c3a4b65bc32f635a281e668.

Is there a proper way to urlencode captured values using xsl functions or will it require changing template? Or probably impossible at all but using PHP.

Thanks.

JoshyPHP commented 1 year ago

You'd have to use an attribute filter. There's an example in the repository of a media site using urldecode on a name attribute: https://github.com/s9e/TextFormatter/blob/b06918f8f0ea9bbef7ddddf1d6347920e3b75328/src/Plugins/MediaEmbed/Configurator/sites/googleplus.xml#L8-L10 (yours would use urlencode)

Or you can use the built-in #url filter if you want the whole value to be treated as a URL. For example: https://github.com/s9e/TextFormatter/blob/b06918f8f0ea9bbef7ddddf1d6347920e3b75328/src/Plugins/MediaEmbed/Configurator/sites/odysee.xml#L6-L9

rxu commented 1 year ago

Thanks a lot. Tried it but didn't help in my case though, probably the issue is unrelated.

JoshyPHP commented 1 year ago

Did you forget the u modifier in the regexp? I have this, but I haven't checked the output for correctness.

$configurator = new s9e\TextFormatter\Configurator;

$configurator->MediaEmbed->add(
    'applemusic',
    [
        'host'    => 'music.apple.com',
        'extract' => "!//music.apple.com/(?'country'[a-z]{2})/playlist/(?'plname'[\\pL\\d\\-%A-F]+)/(?'plid'pl\\.[\\w\\d]+)!u",
        'iframe'  => ['src' => '//embed.music.apple.com/{@country}/playlist/{@plname}/{@plid}'],
        'attributes' => ['plname' => ['filterChain' => '#url']]
    ]
);

extract($configurator->finalize());

$text = 'https://music.apple.com/ru/playlist/песни-о-любви-на-русском-главное/pl.764df3a81ac74382b4d6a8ad3b32e850';
$xml  = $parser->parse($text);

die("$xml\n");
<r><APPLEMUSIC country="ru" plid="pl.764df3a81ac74382b4d6a8ad3b32e850" plname="%D0%BF%D0%B5%D1%81%D0%BD%D0%B8-%D0%BE-%D0%BB%D1%8E%D0%B1%D0%B2%D0%B8-%D0%BD%D0%B0-%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%BC-%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%BE%D0%B5">https://music.apple.com/ru/playlist/песни-о-любви-на-русском-главное/pl.764df3a81ac74382b4d6a8ad3b32e850</APPLEMUSIC></r>
rxu commented 1 year ago

Wow. Indeed that fixed it. The clue is closer than where you try to find it as usually. I'm not even sure urlencode is even needed at all as it works without it. Thanks a bunch again.