webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.41k stars 217 forks source link

A fix for GitHub issue #865 #917

Open lasztoth opened 3 months ago

lasztoth commented 3 months ago

Description

This PR fixes the issue #865. Specifically, it appears that the bug was introduced (perhaps by accident) by commenting out lines 288-289 from responseloader.py. It appears that uncommenting these lines, i.e., returning from the method if there is already a WARC filename and offset for the record, then the self-redirects work correctly. These lines were commented between versions 2.6.9 and 2.7.0b, which corresponds to the issue description. After this bug fix, self-redirects work correctly once again (on a local test system).

Motivation and Context

Solves #865.

Types of changes

Checklist:

tw4l commented 3 months ago

I have a vague memory now of this being commented out in relation to the development of the Vue banner UI, but for the life of me can't remember why. Will try to dig back through my notes from that time!

tw4l commented 3 months ago

Ah, I dug back through our internal Discord conversations and remember now! This was our hacky temporary solution at the time to get pywb to populate the calendar from a remote CDX server of form cdx+https://example.com/webarchive/cdx in pywb 2.6.8, as without these lines commented out we'd receive No such file or directory errors because the actual archive files weren't available locally. It likely got committed as part of the Vue banner work by accident, as the PR we ended up merging at the time for what was very large.

Edit: It seems this is still true in latest main, if you want to load an archive from a remote CDX source and actually be able to view the content, it's still necessary to comment out these lines. But there should be a better fix possible here.

lasztoth commented 3 months ago

Thanks a lot @tw4l for taking a look at this!

obrienben commented 2 months ago

@tw4l are you looking into a fix that addresses the calendar issue you mentioned and 865?

tw4l commented 2 months ago

Hi @lasztoth and @obrienben, I'm having some health issues that have limited my capacity at the moment. I'll be out of the office next week but will try to prioritize this when I return.