wikimedia / html-metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)
MIT License
138 stars 44 forks source link

Microdata: Underlying lib does not honour "content" attribute on all tags #68

Open iamjochem opened 6 years ago

iamjochem commented 6 years ago

hi there,

the module used to parse microdata does not honour the microdata spec with regard to "content" tags attributes.

the spec states:

HTML only allows the content attribute on the meta element. This specification changes the content model to allow it on any element, as a global attribute.

the relevant function in microdata-node only looks at the "content" attribute for "meta" tags.

I mention this problem here due to the fact that the microdata-node module has seen no development in the last 3 years.

mvolz commented 6 years ago

Thanks for reporting - why don't you report it over there as well? It doesn't look like anyone has reported it. It does look dead but sometimes that happens because no one has been reporting issues :).

Let us know if you know of another library that's better as well.

iamjochem commented 6 years ago

hi!

why I didn't report it there? looking dead was one reason, microdata being your dep the other ... I have now made a new issue (and referenced this one) ... question of waiting for a reply/opinion (I hesitate to call "bug" because I feel this has everything to do with wanting to deal gracefully with tag-soup regardless of spec(s))

I am not aware of a better microdata specific lib TBH - I have been using this module side by side with [suq]() for webpage data-scraping purposes (this module and suq have alot of overlap but each one has it's own strengths - leveraging both allows me to cover more "HTML sceanarios" :-) ).

kind rgds.