mozilla / page-metadata-parser

DEPRECATED - A Javascript library for parsing metadata on a web page.
https://www.npmjs.com/package/page-metadata-parser
Mozilla Public License 2.0
270 stars 42 forks source link

Getting undefined url response on "bad" request #25

Closed pdehaan closed 8 years ago

pdehaan commented 8 years ago

I mistakenly tried to CURL https://www.cnn.com (which mysteriously doesn't support HTTPS, only HTTP) and it's giving me undefined in the response JSON.

Maybe just a case of "don't do that" and we can close this, but it looked weird and maybe we should be returning an error (even though we get an "HTTP/1.1 200 OK" response:


$ curl -i -XPOST -H "content-type: application/json" -d '{"urls": ["https://www.cnn.com"]}' http://localhost:7001 | JSON

HTTP/1.1 200 OK
content-type: application/json; charset=utf-8
cache-control: no-cache
content-length: 36
Date: Thu, 30 Jun 2016 20:01:40 GMT
Connection: keep-alive

{
  "error": "",
  "urls": {
    "undefined": {}
  }
}
pdehaan commented 8 years ago

FWIW, here's the response from HTTP (and not the invalid HTTPS):

$ curl -i -XPOST -H "content-type: application/json" -d '{"urls": ["http://www.cnn.com"]}' http://localhost:7001 | JSON

HTTP/1.1 200 OK
content-type: application/json; charset=utf-8
cache-control: no-cache
content-length: 560
Date: Thu, 30 Jun 2016 20:02:00 GMT
Connection: keep-alive

{
  "error": "",
  "urls": {
    "http://www.cnn.com": {
      "description": "View the latest news and breaking news today for U.S., world, weather, entertainment, politics and health at CNN.com.",
      "icon_url": "http://i.cdn.turner.com/cnn/.e/img/3.0/global/misc/apple-touch-icon.png",
      "image_url": "http://i.cdn.turner.com/cnn/.e1mo/img/4.0/logos/menu_politics.png",
      "title": "CNN - Breaking News, Latest News and Videos",
      "type": "website",
      "url": "http://www.cnn.com",
      "original_url": "http://www.cnn.com",
      "provider_url": "http://www.cnn.com",
      "favicon_url": "http://www.cnn.com/favicon.ico"
    }
  }
}
pdehaan commented 8 years ago

The undefined issue seems to be resolved in latest master:

$ http POST http://localhost:7001/v1/metadata urls:='["https://www.cnn.com"]' -j -v

POST /v1/metadata HTTP/1.1
Accept: application/json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 33
Content-Type: application/json; charset=utf-8
Host: localhost:7001
User-Agent: HTTPie/0.9.1

{
    "urls": [
        "https://www.cnn.com"
    ]
}

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 22
Content-Type: application/json; charset=utf-8
Date: Thu, 11 Aug 2016 18:08:43 GMT
ETag: W/"16-urTtGfwwfQX5N25qpNbXOg"
X-Powered-By: Express

{
    "error": "",
    "urls": {}
}