webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.42k stars 217 forks source link

Incorrect URLS and redirect loop #40

Closed protonpopsicle closed 10 years ago

protonpopsicle commented 10 years ago

looking at list of captures: mydomain.com/pywb/*/<url>

on localhost the links are of the format: mydomain.com/pywb/<id>/<url>

on my deployed site the links are: mydomain.com/pywb/<current page url>/<id>/<ur>/

If I manually enter in the correct url for a capture, I get a redirect loop. not a problem on localhost.

my environment is setup the same as described here: https://github.com/ikreymer/pywb/issues/39 this is probably an issue with how I have things set up

ikreymer commented 10 years ago

Hmm, not sure I'm fully understanding this issue. Can you provide exact examples of localhost vs deployed url examples? Are the generated urls incorrect or do they result in redirect loop?

protonpopsicle commented 10 years ago

yea sorry. it's both issues. the generated urls are incorrect. in addition to that, putting in the correct url for a capture results in a redirect loop.

real URL examples:

on this page:

localhost: http://localhost:8080/pywb/*/rtmark.com

deployed: http://mydomain.com/pywb/*/rtmark.com

the URLS for a capture are different:

localhost: http://localhost:8080/pywb/20121011215544/http://www.rtmark.com/

deployed: http://mydomain.com/pywb/*/http://rtmark.com/20121011215544/http:/www.rtmark.com/

the one on the deployed site goes to a page saying: "No Captures found for: http://rtmark.com/20121011215544/http:/www.rtmark.com/"

on localhost it works properly.

If I try going to this URL, which I assume is the correct url for the capture:

deployed: http://mydomain.com/pywb/20121011215544/http://www.rtmark.com/

it results in a redirect loop

ikreymer commented 10 years ago

I wonder if something is off in the nginx config.. Can you try accessing pywb directly without nginx, on port 8001?

Also, try uncommenting and changing these setting, to have it use relative paths, and also add your domain to hostpaths (although that's more useful for proxy mode).

hostpaths: ['http://localhost:8080/', 'mydomain.com/']
absoulte_paths: false
ikreymer commented 10 years ago

For the self-redirect issue, see if there are multiple entries for cdx by using the cdx api: pywb-cdx?url=rtmark.com/

Are there multiple entries for that url, and is one of them a redirect? If not, it's probably related to the path issue.

protonpopsicle commented 10 years ago

there are not multiple entries for cdx. the hostpaths and absolute_paths settings are useful. I am closing because it appears to be an issue my web server configuration not pywb.

ikreymer commented 10 years ago

np, feel free to reopen if there are any ways pywb could make the config simpler, any possible 'gotchas' that you find.