peetzweg / aves

0 stars 0 forks source link

Extracting Domain and Resource form URL #10

Closed peetzweg closed 7 years ago

peetzweg commented 7 years ago

I currently have this, but its not working great and is not tested at all, should definitely write test for that one!

const parseURL = rawURL => {
    const uriDecoded = decodeURIComponent(rawURL);
    const url = new Buffer(uriDecoded, 'base64').toString('ascii'); // reverse base64 encoding
    const regex = /https?\:\/\/(?:www\.)?([^\/?#]+)(?:[\/?#]|$)/i;
    const elements = url.match(regex);

    return {
        domain: elements[1],
        resource: elements[0]
    };
};
peetzweg commented 7 years ago

https://www.quora.com/Whats-the-best-method-to-extract-article-text-from-HTML-documents

peetzweg commented 7 years ago

Should be done on the client side, way easier through the document.location object.

https://developer.mozilla.org/en-US/docs/Web/API/Location