highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.55k stars 3.58k forks source link

(YAML) Multiline strings don't support empty lines #4090

Open dmelikhov opened 1 month ago

dmelikhov commented 1 month ago

Describe the issue YAML multiline strings break on first empty line even when the rest of the block is properly indented.

Which language seems to have the issue? YAML

Are you using highlight or highlightAuto? highlight

Sample Code to Reproduce

foo:
  bar: |
    still: a string

    not: anymore

Additional context I was able to workaround the issue by replacing this regex with

[\\|>]([1-9]?[+-])?[ ]*\\n+( +)[^ ][^\\n]*\\n+(\\2[^\\n]+\\n*)*

It's not perfect and still has some false positives/negatives but should cover more valid cases.

joshgoebel commented 1 month ago

YAML is the worst. :) What are the valid rules for multi-line strings anyways? Is there some chance we could do this better with our own parser rather than a super complex regex?

Did we just change the \n+ to \n*?

dmelikhov commented 1 month ago

The first thing that came to my mind after seeing the regex is this. :)

I don't have enough knowledge about the library's architecture, but I think it's not possible to properly detect multiline strings using a regex, as it knows nothing about the property's indentation that contains the multiline string.

@joshgoebel In the regex I mentioned \n is replaced by \n+ and \n? is replaced by \n*. But I can't give you any guarantees that it's better than the currently used one.

Dxuian commented 2 weeks ago

i believe added a fix for this @joshgoebel pls 🙏🙏🙏 review #4111 (and other prs)