machawk1 / warcreate

Chrome extension to "Create WARC files from any webpage"
https://warcreate.com
MIT License
206 stars 13 forks source link

Many pages produce superfluous "null" appended URIs in metadata #70

Closed machawk1 closed 9 years ago

machawk1 commented 9 years ago

For example:

WARC/1.0
WARC-Type: metadata
WARC-Target-URI: http://www.cs.odu.edu/~mweigle/
WARC-Date: 2015-08-20T00:23:30Z
WARC-Concurrent-To: <urn:uuid:dddc4ba2-c1e1-459b-8d0d-a98a20b87e96>
WARC-Record-ID: <urn:uuid:6fef2a49-a9ba-4b40-9f4a-5ca5db1fd5c6>
Content-Type: application/warc-fields
Content-Length: 7934

outlink: http://www.cs.odu.edu/~mweigle/pics/mweigle-ODU.jpg E =EMBED_MISC
outlink: http://www.cs.odu.edu/~mweigle/icons/odu-color.png E =EMBED_MISC
outlink: http://www.cs.odu.edu/~mweigle/icons/odu2l.png E =EMBED_MISC
outlink: http://www.cs.odu.edu/~mweigle/pmwiki/pub/skins/mweigle/mweigle.css E =EMBED_MISC
outlink: http://www.cs.odu.edu/~mweigle/null E =EMBED_MISC
outlink: http://www.cs.odu.edu/~mweigle/null E =EMBED_MISC
outlink: http://www.cs.odu.edu L a/@href
outlink: http://www.odu.edu L a/@href
...
machawk1 commented 9 years ago

Issue source is in href for CSS files in content.js.