jacktuck / unfurl

Metadata scraper with support for oEmbed, Twitter Cards and Open Graph Protocol for Node.js :zap:
MIT License
475 stars 51 forks source link

only absolute urls are supported #33

Closed shaunc869 closed 5 years ago

shaunc869 commented 6 years ago

When I try to run unfurl against this URL: https://www.gohighlevel.com/blog/2018/04/25/the-winner-take-all-world-of-dental-reviews/index.html

I get the error:

<rejected> Error: only absolute urls are supported
    at /Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/node-fetch/index.js:54:10
    at new Promise (<anonymous>)
    at new Fetch (/Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/node-fetch/index.js:49:9)
    at Fetch (/Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/node-fetch/index.js:37:10)
    at /Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/unfurl.js/index.js:296:14
    at <anonymous>
    at process._tickDomainCallback (internal/process/next_tick.js:228:7) } reason: Error: only absolute urls are supported
    at /Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/node-fetch/index.js:54:10
    at new Promise (<anonymous>)
    at new Fetch (/Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/node-fetch/index.js:49:9)
    at Fetch (/Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/node-fetch/index.js:37:10)
    at /Users/shaun/Documents/PycharmProjects/spm-appengine/node_modules/unfurl.js/index.js:296:14
    at <anonymous>
    at process._tickDomainCallback (internal/process/next_tick.js:228:7)

Is this a known issue or is their a way to convert these to absolute urls on the fly? Thanks!

jacktuck commented 6 years ago

Just took a very quick look and it appears that website provides an external, relative, link to oembed data.

<link rel="alternate" type="text/xml+oembed" href="./../../../../wp-json/oembed/1.0/embed/index.html?url=.%2F2018%2F04%2F25%2Fthe-winner-take-all-world-of-dental-reviews%2F&#038;format=xml" />

So I think you've uncovered a few issues:

I'll tackle these sometime over the coming days or i'm more than happy to accept a PR if you want to dig around :)

Thanks for reporting 👍

jacktuck commented 6 years ago

Actively looking at this just struggling to find time.

jacktuck commented 5 years ago

Hi @shaunc869

This should be fixed in the latest major prerelease. You can get it under the beta tag: npm install unfurl.js@beta for now. Schema has changed quite a lot from 1.x.x so you will probably want to give the docs another look.

Nice find btw

jacktuck commented 5 years ago

This is the result for the link you provided:

{
    "description": "HighLevel",
    "favicon": "https://www.gohighlevel.com/favicon.ico",
    "keywords": "HighLevel",
    "open_graph": {
        "description": "If you’ve shopped online in the last 5 years you’ve no doubt shopped on Amazon.com, in fact let’s face it we all now shop in Amazon.com, maybe even for our dental equipment. Amazon is fast replacing offline retail for many reasons, it has easy 24/7 access to products, fast 2-day shipping, but it’s real value …",
        "images": [
            {
                "height": 450,
                "url": "https://www.gohighlevel.com/wp-content/uploads/2018/03/amazon-online-review-6797-676x450.jpg",
                "width": 676
            }
        ],
        "locale": "en_US",
        "site_name": "HighLevel",
        "title": "The winner-take-all world of dental reviews - HighLevel",
        "type": "article",
        "url": "https://www.gohighlevel.com/2018/04/25/the-winner-take-all-world-of-dental-reviews/"
    },
    "title": "The winner-take-all world of dental reviews",
    "twitter_card": {
        "card": "summary_large_image",
        "description": "If you’ve shopped online in the last 5 years you’ve no doubt shopped on Amazon.com, in fact let’s face it we all now shop in Amazon.com, maybe even for our dental equipment. Amazon is fast replacing offline retail for many reasons, it has easy 24/7 access to products, fast 2-day shipping, but it’s real value […]",
        "images": [
            {
                "url": "https://www.gohighlevel.com/wp-content/uploads/2018/03/amazon-online-review-6797-676x450.jpg"
            }
        ],
        "title": "The winner-take-all world of dental reviews - HighLevel"
    }
}

It's still a WIP so schema can change in prerelease versions. For instance, keywords there should probably be an array rather than a string!

EDIT: keywords is fixed now and will be an array

Please ping me if you notice anything else :)