Closed maeb closed 3 years ago
Referencing my previous issue #656 here. That issue concerned encoding of the url parameter in the query string of the request between the frontend and the backend. This issue concerns encoding of the same parameter in the request between the backend and the warcserver (as configured via the replay_url).
Just to confirm, this was for use with OutbackCDX, right? Or some other configuration?
We use our own indexer and loader backend https://github.com/nlnwa/gowarcserver.
Our config looks something like:
collections:
veidemann:
index:
type: cdx
api_url: http://gowarcserver:9999/warcserver/all/index?url={url}&closest={closest}
replay_url: http://gowarcserver:9999/warcserver/all/resource?url={url}&closest={timestamp}&output=content
Formatting of _loadurl does not encode the url parameter properly if it ends up in the query string of the configured _urlfield (_replayurl):
https://github.com/webrecorder/pywb/blob/843fe28ed8cc497c3a11345243dbcfc288455337/pywb/warcserver/index/indexsource.py#L160-L162
Some url's does not survive query parameter parsing unscaded when the url parameter is part of the query string of the _loadurl.
This seems to fix the issue:
I believe this is a proper fix without breaking changes, but I am not sure. Shall I post a PR?