houseabsolute / Markdent

An event-based Markdown parser toolkit
http://metacpan.org/release/Markdent/
Other
12 stars 13 forks source link

Sticky html entitites #3

Closed imago-storm closed 9 years ago

imago-storm commented 9 years ago

Hello, it's me again. It seems that regexp for HTML entity is too greedy. Text «word» becomes a tree like this

$VAR1 = [
  [
    {
      "type" => "paragraph"
    },
    [
      {
        "type" => "html_entity",
        "entity" => "laquo;word&raquo"
      },
      {
        "type" => "text",
        "text" => "\n"
      }
    ]
  ]
];

Code sample:

use Tree::Simple::Visitor::ToNestedArray;
use Data::Dumper;
use Markdent::Parser;
use Markdent::Handler::MinimalTree;

my $visitor = Tree::Simple::Visitor::ToNestedArray->new;
my $handler = Markdent::Handler::MinimalTree->new;

my $parser = Markdent::Parser->new(
    handler => $handler,
);

my $markdown = "«word»";
$parser->parse( markdown => $markdown );
my $tree = $handler->tree;

$tree->accept($visitor);
my $array_tree = $visitor->getResults();

local $Data::Dumper::Indent = 1;
local $Data::Dumper::Useqq = 1;
print Dumper $array_tree;
autarch commented 9 years ago

I merged this from the CLI. Thanks!