mozilla / page-metadata-parser

DEPRECATED - A Javascript library for parsing metadata on a web page.
https://www.npmjs.com/package/page-metadata-parser
Mozilla Public License 2.0
270 stars 42 forks source link

Custom formatting for provider ? #88

Closed artemjackson closed 7 years ago

artemjackson commented 7 years ago

For sites that don't provide og:site_name (like schema.org) I'd like to see schema.org instead of just schema.

How could I achieve this?

A little example of what I'm trying to do:

const jsdom = require('jsdom');
const { getMetadata, metadataRules } = require('page-metadata-parser');

const url = 'http://schema.org/';

jsdom.env(url, (err, { document }) => {
    const pageMetadata = getMetadata(document, url);
    console.log(pageMetadata.provider);    // 'schema', but I'd like it to be 'schema.org'
});
jaredlockhart commented 7 years ago

Yeah I see what you mean, provider isn't supposed to be a URL like thing, it's supposed to be something more like a company name or institution name. I think you can get what you want just by parsing the URL object itself, something like:

const urllib = require('url');
const url = 'https://schema.org';
console.log(urllib.parse(url).host);
-> 'schema.org'