mozilla / page-metadata-parser

DEPRECATED - A Javascript library for parsing metadata on a web page.
https://www.npmjs.com/package/page-metadata-parser
Mozilla Public License 2.0
270 stars 42 forks source link

Unable to scrape kickstarter pages #70

Closed pdehaan closed 2 years ago

pdehaan commented 7 years ago

Scraping https://www.kickstarter.com/projects/1280803647/muzo-your-personal-zone-creator-with-noise-blockin/description gives me no usable data.

Looking at view-source:https://www.kickstarter.com/projects/1280803647/muzo-your-personal-zone-creator-with-noise-blockin/description shows me a bunch of <meta> that isn't available when I cURL the page directly, so I'm guessing there is some shenanigans happening on the Kickstarter side.

$ http https://page-metadata.services.mozilla.com/v1/metadata urls:='["https://www.kickstarter.com/projects/1280803647/muzo-your-personal-zone-creator-with-noise-blockin/description"]' -j

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 475
Content-Type: application/json; charset=utf-8
Date: Sat, 24 Sep 2016 00:30:14 GMT
ETag: W/"1db-rGqz4BBDtZ7heHcjgO72Ow"

{
    "request_error": "",
    "url_errors": {},
    "urls": {
        "https://www.kickstarter.com/projects/1280803647/muzo-your-personal-zone-creator-with-noise-blockin/description": {
            "favicon_url": "https://www.kickstarter.com/favicon.ico",
            "images": [],
            "original_url": "https://www.kickstarter.com/projects/1280803647/muzo-your-personal-zone-creator-with-noise-blockin/description",
            "url": "https://www.kickstarter.com/projects/1280803647/muzo-your-personal-zone-creator-with-noise-blockin/description"
        }
    }
}