Closed pdehaan closed 7 years ago
A slightly verbose way to do it using tldjs and url.parse()
:
var tld = require('tldjs');
const url = require('url');
const domain = "https://www.bbc.co.uk/foo/bar.html";
const parsed = url.parse(domain);
// Inject the values from tldjs:
parsed.tldExists = tld.tldExists(domain); // true
parsed.domain = tld.getDomain(domain); // bbc.co.uk
parsed.subdomain = tld.getSubdomain(domain); // www
parsed.publicSuffix = tld.getPublicSuffix(domain); // co.uk
parsed.isValid = tld.isValid(domain); // true
console.log(JSON.stringify(parsed, null, 2));
Fixed by #120
We currently get these from Embed.ly, and we may be using them in Activity Stream. Notably in this mockup, we want to display "fivethirtyeight" next to a highlight: https://github.com/mozilla/activity-stream/pull/1153#issuecomment-243483512
Currently Embedly returns a
provider_name
,provider_display
, andprovider_url
. I think we partially had these before but they were removed because they were redundant at the time: https://github.com/mozilla/page-metadata-service/issues/27Given the following URL, we'd probably want to do the following magic:
http://www.fivethirtyeight.com/features/the-broncos-new-quarterback-is-inexperienced-but-at-least-hes-not-peyton-manning/
provider_url
: http://fivethirtyeight.com — just the base protocol, and domain of the canonical URL, but no path.provider_display
: Looks like just theprovider_url
, minus the protocol.provider_name
: FiveThirtyEight — possibly extracted from theog:site_name
ortwitter:site
metadata (minus leading "@"). If not available maybe just theprovider_url
, minus the protocol, "www.", and TLD (which would mean we'd have to know the difference between dual TLDs like bbc.co.uk versus www.bbc.com).http://embed.ly/docs/explore/extract?url=http%3A%2F%2Fwww.fivethirtyeight.com%2Ffeatures%2Fthe-broncos-new-quarterback-is-inexperienced-but-at-least-hes-not-peyton-manning%2F
provider_url
: http://fivethirtyeight.comprovider_name
: FiveThirtyEightprovider_display
: fivethirtyeight.com