kapouer / url-inspector

Get metadata about any url
MIT License
28 stars 8 forks source link
inspector metadata url

url-inspector

Synopsys

npx url-inspector <url>

Description

Get normalized metadata about what a URL mainly represents.

This is a Node.js module.

Sources of information:

Inspection stops when enough information has been gathered, or when a maximum number of bytes (depending on media type) have been downloaded.

Format

Install

url-inspector currently requires those external libraries/tools:

Both programs are well-maintained, and available in most linux distributions.

Usage


import Inspector from 'url-inspector';

const opts = {
 ua: "Mozilla/5.0", // override ua, defaults to somewhat modern browser
 nofavicon: false, // disable additional requests to get a favicon
 nosource: false, // disable main embedded media sub-inspection
 file: true, // local files inspection is only enabled by default when using CLI
 meta: {} // user-entered metadata, to be merged and normalized
 providers: null // custom providers (module path or array)
};

const inspector = new Inspector(opts);

const obj = await inspector.look(url);

Inspector throws http-errors instances.

By default oembed providers are

It is possible to add custom providers in the options, by passing an array or a path to a module exporting an array.

See src/custom-oembed-providers.js for examples.

To normalize an already existing metadata object, including url rewriting done by providers, and other changes in fields, do:

await inspector.norm(obj);

url-inspector uses node-libcurl to make http requests, and exposes it as:

const req = await Inspector.get(urlObj);

where req.abort() stops the request, req.res is the response stream, and res.statusCode, res.headers are available.

Proxy support

url-inspector configures http(s) proxies through proxy-from-env package and environment variables (http_proxy, https_proxy, all_proxy, no_proxy):

Read proxy-from-env documentation.

License

Open Source, see ./LICENSE.