Open shawnmjones opened 6 years ago
//abs.twimg.com/favicons/favicon.icohas is a scheme-less URI, it inherits the scheme of its parent URI. the point is to not have browsers freak out w/ http and https combos of embedded images, css, etc.
https://www.google.com/search?q=schemeless+url&ie=utf-8&oe=utf-8&client=firefox-b-1
it's a stupid thing that needs to die, but it will live on in archives...
If the context of the favicon (i.e., referrer document) is available then you can use the scheme from that or simply use one of the http
or https`. Generally, archives are going to normalize the scheme any way (except a few).
This favicon comes from the content of the URI-R, so I should be able to follow the suggestion from @ibnesayeed and just use the scheme of the referrer.
If it were part of the memento, I could just use the scheme of the original URI and use datetime negotiation to determine if the archive stored it.
To account for these, I'll have to test each favicon (and image) URI to ensure that it has a scheme and take appropriate action if the scheme does not exist.
For
http://archive.is/Q8pGf
the favicon URI//abs.twimg.com/favicons/favicon.ico
has no schema and thus the requests library does not know what to do with it.I'm not sure what the best expected behavior is here.
See #86.
(updated to remove typo)