Closed pdehaan closed 8 years ago
I just tried this using the latest and got this result:
In [74]: fetch_metadata('https://moz-activity-streams-dev.s3.amazonaws.com/dist/latest.html') Out[74]: {u'request_error': u'', u'url_errors': {}, u'urls': {u'https://moz-activity-streams-dev.s3.amazonaws.com/dist/latest.html': {u'favicon_url': u'https://moz-activity-streams-dev.s3.amazonaws.com/favicon.ico', u'images': [], u'original_url': u'https://moz-activity-streams-dev.s3.amazonaws.com/dist/latest.html', u'title': u'latest activity stream experiment addon', u'url': u'https://moz-activity-streams-dev.s3.amazonaws.com/dist/latest.html'}}}
Seems to be working now, closing this.
@jaredkerim Yes, I fixed that specific page w/ my glorious https://github.com/mozilla/activity-stream/pull/1076 PR.
But the question remains on whether we have the core bug fixed (where an invalid page returns no errors or metadata), or if we've fixed that issue with our improved promise rejection handling and sentry reporting.
Re: https://moz-activity-streams-dev.s3.amazonaws.com/dist/latest.html
My request looks legit, but it seems to be choking somewhere and not returning a result in the
urls{}
response object.Further investigation needed, but from the looks of view-source:https://moz-activity-streams-dev.s3.amazonaws.com/dist/latest.html it looks very barebones and basic and should work.
Embed.ly returns some response, but it isn't great [because our download page is super minimal]: http://embed.ly/docs/explore/extract?url=https%3A%2F%2Fmoz-activity-streams-dev.s3.amazonaws.com%2Fdist%2Flatest.html
https://validator.w3.org/nu/#textarea seems to suggest that our HTML skills aren't great, and we're throwing invalid markup onto The Internet:
utf8
for attributecharset
on elementmeta
:utf8
is not a preferred encoding name. The preferred label for this encoding isutf-8
.head
is missing a required instance of child elementtitle
.style
element betweenhead
andbody
.