izyuumi / html2md-rs

HTML to Markdown Parser in Rust
https://crates.io/crates/html2md-rs
MIT License
10 stars 1 forks source link

Incorrect Error: Malformed attribute #23

Closed getreu closed 5 months ago

getreu commented 5 months ago

input

create with

curl https://askubuntu.com/questions/189640/how-to-find-architecture-of-my-pc-and-ubuntu -o test.txt

The file:

test.txt

Incorrect Error

Malformed attribute: id=“search” role=“search” action=/search class=“s-topbar–searchbar js-searchbar “ autocomplete=“off” - Missing quotation mark at around index 13951

izyuumi commented 5 months ago

I'm pretty sure that error is coming from the action=/search part of your HTML snippet. Do you know if that's compliant with the HTML standard?

getreu commented 5 months ago

You can use [The W3C Markup Validation Service](https://validator.w3.org/).

Apart from that, as action=/search does not affect the Markdown rendition anyway, I suggest to silently ignore it. After all, your lib is a converter, not a validator?

izyuumi commented 5 months ago

I went to the website you've shared, and searched for /search, found this. Does not look like action=/search, but rather action="/search".

CleanShot 20240402 235157@2x

P.S. This project is indeed not a validator, however, it shouldn't be parsing any kind of HTML-like snippet, and some standards should be met by whatever input is given.

getreu commented 5 months ago

Have you tried?

curl https://askubuntu.com/questions/189640/how-to-find-architecture-of-my-pc-and-ubuntu -o test.txt

This is one of the workflows, the HTML of (my) users is generated.

About attributes

HTML Attributes

The HTML standard does not require quotes around attribute values.

However, W3C recommends quotes in HTML, and demands quotes for stricter document types like XHTML

This means, that the token action=/search as well as action="/search" are both valid HTML, even though the latter is preferred.