HTML parsing bnf - Githubissues

vritant24 / cleave-html

A modular way to write static webpages

MIT License

1 stars 0 forks source link

HTML parsing bnf #7

Closed vritant24 closed 6 years ago

vritant24 commented 6 years ago

Here's what I have right now. No support for XML, which I think is fine to start out with. This is sort of a naive way of writing it, but i think it covers most of the use cases. Do you think something might be missing?

<text>        : "any string" && !<tag>
              | <tag>

<comment>     : "<!--" <text> "-->"

<keyword>     : [a-zA-z\-\*]+ 

<attribute>   : <keyoword> [  "="  ["] <text> ["]  ](0-1)

<tag>         : "<"  <keyword>  [<attributes>]*  ">"  <text>  "</"  <keyword>  ">"
              | "<"  <keyword>  [<attributes>]*  "/>"

http-equiv, data-* make the need of - and * in the keyword component

claytn commented 6 years ago

I believe this grammar works, or at least I haven't been able to think of an example that breaks it

claytn commented 6 years ago

Also, through doing some quick Googling I found this stack overflow post and it has some helpful advice. https://stackoverflow.com/questions/7192101/writing-an-html-parser

I don't know that we are looking to build a validating parser though. I think our "parser" will more or less just identify between regular html elements and imported elements that are unknown and inline them as needed. Do you agree or disagree?

vritant24 commented 6 years ago

Yeah as of now validation doesn't matter. Our first goal is to only inline. We could have them follow HTML 5 Standards and throw warnings as we're parsing. But it doesn't affect what we're doing in the modularization