s9e / TextFormatter

Text formatting library that supports BBCode, HTML and other markup via plugins. Handles emoticons, censors words, automatically embeds media and more.
MIT License
233 stars 36 forks source link

Reverse parser, ex HTML to BBCode #205

Closed scorninpc closed 2 years ago

scorninpc commented 2 years ago

Is it possible to parse HTML into BBCode, for example?

I'm reading the code to find any method, but not success.

JoshyPHP commented 2 years ago

No, the library accepts various markup languages (including HTML) as input but only supports HTML as output. You could cheat and create HTML templates that look like BBCodes, but that's a hack and subject to myriad corner cases you'd have to handle manually.

$configurator = new s9e\TextFormatter\Configurator;

$configurator->HTMLElements->allowElement('b');
$configurator->HTMLElements->allowElement('a');
$configurator->HTMLElements->allowAttribute('a', 'href');
$configurator->HTMLEntities;

// Templates must only be changed after the elements have been configured
$templates = [
    'a' => '[url="<xsl:value-of select="@href"/>"]<xsl:apply-templates/>[/url]',
    'b' => '[b]<xsl:apply-templates/>[/b]'
];
foreach ($templates as $elName => $template)
{
    $configurator->tags['html:' . $elName]->template = $template;
}
extract($configurator->finalize());

$text = '<b><a href="https://example.org">...</a></b>';
$xml  = $parser->parse($text);
$html = $renderer->render($xml);

die(htmlspecialchars_decode("$html\n"));
[b][url="https://example.org"]...[/url][/b]

Your best bet may be to simply load your HTML document into a DOMDocument, iterate over elements, manually add the BBCode markup then retrieve the textContent of the whole tree.

scorninpc commented 2 years ago

Thank you for reply

You got what i need. Your solution with DOMDocument sounds acceptable. I'll try to work on this way and see the progress

Thank you