kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer
Other
1.2k stars 141 forks source link

`sourcepos` is incorrect for `<script>` tags #448

Open GabeIsman opened 3 months ago

GabeIsman commented 3 months ago

I've encountered an odd issue. I'm using the Ruby wrapper (Commonmarker) and inspecting the AST shows some strange behavior around the sourcepos of <script> tags. AFAICT this is only related to script tags, and only ones that appear all on one line at that.

Commonmarker.parse("<script></script>")
=> #<Commonmarker::Node(document):
  source_position={:start_line=>1, :start_column=>1, :end_line=>1, :end_column=>17}
  children=[#<Commonmarker::Node(html_block):
       source_position={:start_line=>1,
        :start_column=>1,
        :end_line=>0,
        :end_column=>0}>]>

note the end_line and end_column are both zero. In general I've observed that end_line is start_line - 1, and end_column is always zero.

Let me know if I should open an issue on Commonmarker instead, but I don't see how this could be caused by the wrapper. I just lack the rust expertise to test it directly in rust, sorry!

kivikakk commented 3 months ago

Thanks for the report! This is definitely a Comrak thing. No guarantees on when I'll be able to look into this, but it shoooooould be fairly simple. 🤞

digitalmoksha commented 3 months ago

@kivikakk it looks like this is occurring for the tags <script>, <pre>, <textarea>, and <style>. They get recognized in html_block_start, https://github.com/kivikakk/comrak/blob/42f45948c1de9cbd8ff5c70de4f923bbc47dac41/src/scanners.re#L152 and returns a 1, and then https://github.com/kivikakk/comrak/blob/feaf5cffd19435e554535705dbdcea5bf8fd7884/src/parser/mod.rs#L1928-L1938 handles it.

I'm not sure why these would get treated differently than any other HTML block tag, such as a <p>?

kivikakk commented 2 months ago

See https://spec.commonmark.org/0.31.2/#html-blocks.

digitalmoksha commented 2 months ago

Of course, I should've looked at the spec 🤦