Closed sberequek closed 1 year ago
Proxy requests are handled by the proxy(HttpExchange)
method which calls replay()
with the proxy
argument set to true.
Proxy requests can be distinguished from normal requests by exchage.request().target()
being an absolute URL (currently this is done just by proxy being the default fallthrough route).
Note that jwarc's WarcServer hasn't been well tested and lacks important features like a proper index and date-selection UI in proxy mode. It's more of a proof of concept / demo. I would currently recommend pywb's proxy mode instead for most users.
Demonstration that script injection doesn't happen when used in proxy mode:
$ jwarc fetch http://www.example.org/ > /tmp/example.warc
$ jwarc serve /tmp/example.warc &
Listening on port 8080
$ curl --proxy http://localhost:8080 http://www.example.org/
<!doctype html>
<html>
<head>
<title>Example Domain</title>
But does when used in normal replay mode:
$ curl http://localhost:8080/replay/20230101000000/http://www.example.org/
<!doctype html><script src='/__jwarc__/inject.js'></script>
Thanks @ato,
I fixed it, perfect thanks for the tips.
Hi,
when running jwarc as a replay proxy is there a way to disable the serviceworker script injection? Looking at the source code in the WarcServer class I would like to know if it was possible to add a parameter in get request for the "replay" which allows to change the value of the "proxy" argument. Currently the replay method is call always with "proxy" at false (line 112).
Thanks