internetarchive / wayback

IA's public Wayback Machine (moved from SourceForge)
727 stars 132 forks source link

xhtml pages broken #156

Open cweiske opened 6 years ago

cweiske commented 6 years ago

My blog serves pages with the application/xhtml+xml MIME type, which means browsers use the XML parsing mode and do not try to render anything if the XML is not well-formed.

The internet archive software sends out this header for my page, too - but it adds HTML to the top of the page which is not well-formed according to the XML rules.

For example: https://web.archive.org/web/20170606065241/http://cweiske.de/tagebuch/cacert-bye.htm

Firefox 55 and Chromium 60 both stop rendering and show an error:

Chromium:

This page contains the following errors: error on line 141 at column 43: Opening and ending tag mismatch: br line 0 and div

Firefox:

XML processing error: Non matching tag. Expected:
. Adress: https://web.archive.org/web/20170606065241/http://cweiske.de/tagebuch/cacert-bye.htm Line 141, Column 39: Web wide crawl number 16<div><br></div><div>The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.</div><div><br><div><br></div><div><br></div></div> ---------------------------------------------^

One way to fix that bug would be to make your embed code XHTML-compatible.

Sobsz commented 2 months ago

Still reproducible 7 years later: https://web.archive.org/web/20210222222314/https://tilde.club/~acz/shadow_wiki/browsers.xhtml For anyone else who finds this in the future: adding if_ at the end of the datetime segment (e.g. /20210222222314if_/) removes the header and gets around the issue.

CanYouJustWorkPlease commented 3 weeks ago

@Sobsz Thanks! That was really helpful!