mozilla / page-metadata-service

DEPRECATED - A RESTful service that returns the metadata about a given URL.
Mozilla Public License 2.0
19 stars 8 forks source link

Return provider_name, provider_display, and provider_display? #90

Closed pdehaan closed 7 years ago

pdehaan commented 7 years ago

We currently get these from Embed.ly, and we may be using them in Activity Stream. Notably in this mockup, we want to display "fivethirtyeight" next to a highlight: https://github.com/mozilla/activity-stream/pull/1153#issuecomment-243483512

Currently Embedly returns a provider_name, provider_display, and provider_url. I think we partially had these before but they were removed because they were redundant at the time: https://github.com/mozilla/page-metadata-service/issues/27

Given the following URL, we'd probably want to do the following magic:

http://www.fivethirtyeight.com/features/the-broncos-new-quarterback-is-inexperienced-but-at-least-hes-not-peyton-manning/

http://embed.ly/docs/explore/extract?url=http%3A%2F%2Fwww.fivethirtyeight.com%2Ffeatures%2Fthe-broncos-new-quarterback-is-inexperienced-but-at-least-hes-not-peyton-manning%2F

pdehaan commented 7 years ago

A slightly verbose way to do it using tldjs and url.parse():

var tld = require('tldjs');
const url = require('url');

const domain = "https://www.bbc.co.uk/foo/bar.html";

const parsed = url.parse(domain);
// Inject the values from tldjs:
parsed.tldExists = tld.tldExists(domain); // true
parsed.domain = tld.getDomain(domain); // bbc.co.uk
parsed.subdomain = tld.getSubdomain(domain); // www
parsed.publicSuffix = tld.getPublicSuffix(domain); // co.uk
parsed.isValid = tld.isValid(domain); // true

console.log(JSON.stringify(parsed, null, 2));
**OUTPUT:** ``` json { "protocol": "https:", "slashes": true, "auth": null, "host": "www.bbc.co.uk", "port": null, "hostname": "www.bbc.co.uk", "hash": null, "search": null, "query": null, "pathname": "/foo/bar.html", "path": "/foo/bar.html", "href": "https://www.bbc.co.uk/foo/bar.html", "tldExists": true, "domain": "bbc.co.uk", "subdomain": "www", "publicSuffix": "co.uk", "isValid": true } ```
jaredlockhart commented 7 years ago

Fixed by #120