sul-dlss / was-pywb

Configuration for Stanford's pywb instance
https://swap.stanford.edu
Other
2 stars 0 forks source link

Header also displays inside frame occasionally #86

Closed lwrubel closed 2 years ago

lwrubel commented 2 years ago

pywb is showing the red Stanford header inside the frame as well.

Screen Shot 2022-07-05 at 7 17 01 PM

lwrubel commented 2 years ago

See also: https://swap.stanford.edu/was/20160827002355/http://artgoldhammer.blogspot.com/ Screen Shot 2022-07-06 at 4.34.51 PM.png

justinlittman commented 2 years ago

For https://swap.stanford.edu/was/19900101120000/http://www.slac.stanford.edu/: image

mjgiarlo commented 2 years ago

Both of the instances above look like the original site may have had embedded iframes? Or perhaps I'm misunderstanding what's going on.

Yeah, I think I misunderstood. For French Politics, it appears as though the footer is rendering in the wrong spot. And SLAC is showing different behavior altogether.

aaron-collier commented 2 years ago

@lwrubel I'm unable to reproduce the first screen shot. When not found I get "0 results" and the page looks fine. If you can reproduce, can you send me the link? Thanks.

mjgiarlo commented 2 years ago

@aaron-collier https://swap.stanford.edu/was/20160827002355/http://artgoldhammer.blogspot.com/ looks similar

aaron-collier commented 2 years ago

I believe this is an issue with the captured pages html and/or javascript (though I'm not sure how to look at that individually - maybe unpacking a wacz file or something?)

You can see here that the top #document is the main page. For the pages that exhibit this behavior, there is a second #document which indicates, to me, that pywb is inserting somewhere and the captured pages have an unexpected selector that happens to match. The second #document is not found on pages that do not exhibit this behavior.

I'm not sure we can fix this locally, it likely has to be addressed upstream. I'd be happy to open a ticket there if we wish.

Screen Shot 2022-07-07 at 11 29 19 AM

aaron-collier commented 2 years ago

It may be as simple as pywb selecting an iframe on the page (and inadvertently selecting all iframes), so if a page has one we get this odd behavior... Or an iframe with a particular id/class...

https://github.com/webrecorder/pywb/blob/626da99899865e7f9bf9bfdd775218b36d6a2567/pywb/static/wb_frame.js#L58

mjgiarlo commented 2 years ago

Unless @lwrubel or @edsu have other ideas, we may want to ping Ilya or other pywb users about this.

lwrubel commented 2 years ago

Sorry about not including a URL for the first screenshot, @aaron-collier. I can't replicate that now, but here's another error page (don't worry about the particular error, that's an indexing issue): https://swap.stanford.edu/was/20170706223605/http://enchantingthedesert.com/

lwrubel commented 2 years ago

I looked at the issue with error pages, and I'm coming to realize that one possible aspect of this is that we seem to be adding the header.html into all pages, including frame_insert.html. Other pywb instances are not adding the header on frame_insert.html. They're just putting a logo in the existing gray bar and possibly styling the color of it. The way we have this set up, the header displays twice because we're adding it in base.html and frame_insert.html.

See: https://www.webarchive.org.uk/wayback/archive/20130415225159/https://www.gov.uk/government/how-government-works https://www.webarchive.org.uk/wayback/archive/*/https://www.bl.uk/

lwrubel commented 2 years ago

Based on conversation with @aaron-collier, we'll remove the header from frame_insert.html, see if we can get a logo in there at a minimum, and discuss with @peterchanws on 7/11/22.