spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
769 stars 129 forks source link

`:indent` syntax can cause out-of-order results. #577

Open blester125 opened 2 months ago

blester125 commented 2 months ago

Using the wikipedia sandbox we can see that :text creates a indented section image

When this same wikitext is parsed by wtf_wikipedia, the text on the last line gets moved up to the first line and the indentation comes later.

$ cat test.js 
const wtf = require('wtf_wikipedia')

console.log(wtf("This is an\n:interlude\nexample").text())
$ node test.js 
This is an example
 * interlude

I looked through the codebase but wasn't able to find a part that seems to specifically looking for the leading colon indentation mentioned in the indentation section here, so it seems possible this is a byproduct of getting formatted according to the rules of a different use the colon?

spencermountain commented 2 months ago

whoa! You're right - the order of the text gets bungled. Pretty big bug.

Thank you for the great issue. Happy to take a look, and ideally do a hotfix, this week. cheers

spencermountain commented 2 months ago

hey Brian, spent a bunch of the weekend supporting a ton of obscure inline templates - that list you made was great - it would be helpful if you could run the same analysis again, with 10.3.2.

The bad news is that the :indent bug is a gross one, and there's no cute way to fix it, before v11. The problem is that our parser is ordered easy-to-hard, and not to-to-bottom. Here, 'before' and 'after' are treated as Paragraphs, and the indent is a List:

before
:indent
after

I know - it's a pretty gross. I have plans to fix it, and make our parser more chronological, or AST-like. Not quickly though.

Will leave this open - please let me know if I can help further cheers