Closed lwrubel closed 2 years ago
Both of the instances above look like the original site may have had embedded iframe
s? Or perhaps I'm misunderstanding what's going on.
Yeah, I think I misunderstood. For French Politics
, it appears as though the footer is rendering in the wrong spot. And SLAC is showing different behavior altogether.
@lwrubel I'm unable to reproduce the first screen shot. When not found I get "0 results" and the page looks fine. If you can reproduce, can you send me the link? Thanks.
@aaron-collier https://swap.stanford.edu/was/20160827002355/http://artgoldhammer.blogspot.com/ looks similar
I believe this is an issue with the captured pages html and/or javascript (though I'm not sure how to look at that individually - maybe unpacking a wacz file or something?)
You can see here that the top #document
is the main page. For the pages that exhibit this behavior, there is a second #document
which indicates, to me, that pywb is inserting somewhere and the captured pages have an unexpected selector that happens to match. The second #document
is not found on pages that do not exhibit this behavior.
I'm not sure we can fix this locally, it likely has to be addressed upstream. I'd be happy to open a ticket there if we wish.
It may be as simple as pywb selecting an iframe
on the page (and inadvertently selecting all iframes), so if a page has one we get this odd behavior... Or an iframe with a particular id/class...
Unless @lwrubel or @edsu have other ideas, we may want to ping Ilya or other pywb users about this.
Sorry about not including a URL for the first screenshot, @aaron-collier. I can't replicate that now, but here's another error page (don't worry about the particular error, that's an indexing issue): https://swap.stanford.edu/was/20170706223605/http://enchantingthedesert.com/
I looked at the issue with error pages, and I'm coming to realize that one possible aspect of this is that we seem to be adding the header.html
into all pages, including frame_insert.html
. Other pywb instances are not adding the header on frame_insert.html
. They're just putting a logo in the existing gray bar and possibly styling the color of it. The way we have this set up, the header displays twice because we're adding it in base.html
and frame_insert.html
.
See: https://www.webarchive.org.uk/wayback/archive/20130415225159/https://www.gov.uk/government/how-government-works https://www.webarchive.org.uk/wayback/archive/*/https://www.bl.uk/
Based on conversation with @aaron-collier, we'll remove the header from frame_insert.html
, see if we can get a logo in there at a minimum, and discuss with @peterchanws on 7/11/22.
pywb is showing the red Stanford header inside the frame as well.