pngwn / MDsveX

A markdown preprocessor for Svelte.
https://mdsvex.pngwn.io
MIT License
2.4k stars 102 forks source link

"Expected valid tag name" when using special characters like < #76

Open rhythm-section opened 4 years ago

rhythm-section commented 4 years ago

Hello @pngwn, thank you for this great project!

I am having an issue when using the < character inside a Markdown file. Even when replacing it with &lt;. It seems the &lt; gets replaced by < again resulting in the same error from the svelte compiler ("Expected valid tag name"). I even tried to escape the character with \ without success.

Is there any way to escape special characters so the svelte compiler does not throw an error?

My current "solution" is to wrap < inside an inline code block but I do not want to show that part as code. This is the Markdown file I am talking about: https://github.com/nymea/nymea-plugins/blob/rework-readmes/awattar/README.md This is the version with the "fixed" inline code blocks. When removing those, the error gets thrown.

I am not sure if this is a bug or if I miss something here. Just found this code section in MDSveX:

// in code nodes replace the character witrh the html entities
// maybe I'll need more of these

const entites = [
    [/</g, '&lt;'],
    [/>/g, '&gt;'],
    [/{/g, '&#123;'],
    [/}/g, '&#125;'],
];

So I guess the < character should only be replaced inside code blocks as mentioned in the comment, but using &lt; leads to the same error because somewhere during the preprocess it gets replaced with < again.

pngwn commented 4 years ago

Interesting, I would expect using a raw < to break but mdsvex only explicitly replaces the above characters in fenced code (either inline or block). Using &lt; etc. should work. The markdown parser maybe converting these entities behind the scenes. I'll take a look at this.

pngwn commented 4 years ago

They are getting decoded by the markdown parser, this problem is a litle more complex than I thought. I have a potential solution but I'm going to see if there is a simpler way of solving the problem.

In other news my investigations have uncovered another bug, the following does not work either:

 - 1 {"<"} 2

When smartypants is enabled (which it is by default), the quotes get converted to fancy quotes. (#83)

pngwn commented 4 years ago

I can feel another custom node type coming on (for entities), modifying the parser seems to be the "best" approach and should be less work than trying to selectively undo the entity decoding in the transform phase.

rhythm-section commented 4 years ago

Thank you for the investigation! The custom node type sounds good to me.

TheComputerM commented 4 years ago

Try to use {@html ...} as a workaround

cesutherland commented 4 years ago

I'm running into this with Katex as well: #113

pngwn commented 4 years ago

This will be partially addressed by the work discussed in #116. I can make > and } be legal characters in the document without issue (as my html syntax will be very strict and I can escape plain text variants of those characters), however < and { will never be legal plain text characters as they mark the start of various states that the parser will enter. To a degree this issues is irresolvable because some characters just conflict with html and svelte syntax in a way that cannot be correctly analysed. There are a few ways to support some cases but I'll have to look into those at a later date.

pngwn commented 4 years ago

113 has some other information and a nice test case.

wlach commented 3 years ago

This will be partially addressed by the work discussed in #116. I can make > and } be legal characters in the document without issue (as my html syntax will be very strict and I can escape plain text variants of those characters), however < and { will never be legal plain text characters as they mark the start of various states that the parser will enter. To a degree this issues is irresolvable because some characters just conflict with html and svelte syntax in a way that cannot be correctly analysed. There are a few ways to support some cases but I'll have to look into those at a later date.

I wonder if mdsvex should interpret < and { characters followed by a space literally, generally well-formatted HTML/svelte doesn't do this and handling this in a special way would allow a number of obvious cases to work, such as this one: https://github.com/pngwn/MDsveX/issues/113#issue-675576096 (tl;dr: writing foo < bar to make some didactic point)

pngwn commented 3 years ago

Yeah, I am considering this for < for this specific reason. It seems a reasonable tradeoff because otherwise writing very basic syntax will be very difficult. This is especially notable for mdsvex because users are typically developers of some description and lessthan and greaterthan symbols will appear more often than in a typical document.

For curly braces, I'm less certain. It is quite common to have leading and trailing spaces for text expressions (example). I think block syntax requires there to be no space before the # in the current implementation but I can't quite recall as there is no spec.

Curly braces are just generally problematic, they are quite commonly used in custom markdown syntax for additional metadata/ attributes but they pose a bit problem because of their importance to svelte. I'll take a look at some popular use-cases and see if I can figure out a way to disambiguate them when I start work on yet another parser for mdsvex.

I have a new parser (the svelte-parse) that observes this rule and has a well defined AST, although not a parsing spec. However, this will need to be rewritten, probably twice,(don't ask) the first of which I will be starting soon (the second will have no user impact and will be purely internal but more of a long term goal). When I do that, it will also have a parsing spec.

wighawag commented 3 years ago

Is there a workaround for now ?

I tried the following in the playground and all fails

5 &lt; 10
5 < 10
5 {"<"} 10
5 {<} 10
5 {@html <} 10
5 {@html &lt;} 10
josephg commented 3 years ago

Its awful but double escaping seems to work:

5 &amp;lt; 10

It doesn't work in the playground though. For some reason &amp; makes the playground hit another bug and error with Document is not defined

josephg commented 3 years ago

I wonder if mdsvex should interpret < and { characters followed by a space literally, generally well-formatted HTML/svelte doesn't do this and handling this in a special way would allow a number of obvious cases to work, such as this one: #113 (comment) (tl;dr: writing foo < bar to make some didactic point)

The commonmark specification has a list of rules for what constitutes legal tags. Anything that isn't a valid tag is escaped. This example shows a < followed by a space is not considered a valid tag name. Eg, < a> encodes to &lt; a&gt;. (As it does in this comment.)

Commonmark has a test suite of JSON content. We should get that test suite passing in mdsevx.

Madd0g commented 3 years ago

When smartypants is enabled (which it is by default), the quotes get converted to fancy quotes. (#83)

Yes, this is compounded in plugins as well, for example I started using remark-directive and it transforms quotes into smart quotes before it gets to the directive parsing part. Which messes up parameter passing.

I guess it would be cool to have more control over it, maybe when this conversion occurs in the mdsvex pipeline? maybe choose the types of elements it operates on?

pngwn commented 3 years ago

Mdsvex will never be commonmark compliant. Even less so I'm 1.0. That said, for 1.0, I'll be porting/modifying the commonmark test cases across and restricting html syntax to solve this issue.

In the current implementation there isn't anything that can be done about it.

I'm working on the 1.0 parser now, which will bring this under my control.

MrVauxs commented 1 year ago

Hello, has this been addressed in a more reliable way than needing to double-escape the < character?