Closed richardreeze closed 1 month ago
Thanks for creating the issue @richardreeze !
This would be a great addition to the tool. The meta-data has semantic value and would definitely benefit some use-cases (like search engines).
Researching this, I think it makes sense to include all the meta-data level parts in the output, like:
As some of the data (especially social network related tags) might overlap with other meta-data, I'm considering having two levels under the option of "includeMetaData"
basic
level that includes meta-tagsextended
level that includes social tags, jsonld data and possibly other formats (like rdfa) down the road..As for the output format, I would prefer to avoid pure JSON (though we could add a flag to choose the wanted formatting for the meta-data block), I am preferring YAML for this, so something along the lines of:
---
title: "Page Title"
description: "Page description here."
schema:
"@context": "https://schema.org"
"@type": "WebPage"
# ... rest of the schema
---
# H1 tag ...
Thoughts?
This would be great! And yes much cleaner than what I suggested.
I hope this becomes a feature! 🙂
@richardreeze hot from the compiler-
> npx d2m -u https://example.com -meta standard
---
title: "Example Domain"
---
# Example Domain
This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.
[More information...](https://www.iana.org/domains/example)
It would be great if the tool had an option for including the page's schema (which is contained inside the HTML). The same way I can select
onlyMainContent
,includeHtml
, etc...Example
Here's a simple example of how I currently extract schema from a page's HTML
It usually looks like this:
Benefits
This would provide additional information that could be valuable for various use cases (like SEO analysis and structured data extraction).