internetarchive / warcprox

WARC writing MITM HTTP/S proxy
378 stars 54 forks source link

Help getting started? #110

Open hanoii opened 5 years ago

hanoii commented 5 years ago

I am researching several tools and wanted to give this one a try. I am on osx and installed both the stable and master branch with pip3 but I can't seem to get it to work on either latest chrome or firefox. I always get

2019-01-09 18:04:32,949 12768 WARNING MitmProxyHandler(tid=n/a,started=2019-01-09T21:04:32.941738,client=127.0.0.1:64026) warcprox.warcproxy.WarcProxy.handle_error(warcproxy.py:535) exception processing request <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0> from ('127.0.0.1', 64026)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/warcprox/mitmproxy.py", line 659, in _process_request_thread
    request = self.finish_request(request, client_address)
  File "/usr/local/lib/python3.7/site-packages/warcprox/mitmproxy.py", line 606, in finish_request
    req_handler = self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.7/site-packages/warcprox/mitmproxy.py", line 222, in __init__
    http_server.BaseHTTPRequestHandler.__init__(self, request, client_address, server)
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socketserver.py", line 720, in __init__
    self.handle()
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/server.py", line 426, in handle
    self.handle_one_request()
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/server.py", line 414, in handle_one_request
    method()
  File "/usr/local/lib/python3.7/site-packages/warcprox/mitmproxy.py", line 340, in do_CONNECT
    self.handle_one_request()
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/server.py", line 394, in handle_one_request
    self.raw_requestline = self.rfile.readline(65537)
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: SSLV3_ALERT_BAD_CERTIFICATE] sslv3 alert bad certificate (_ssl.c:2484)

with google or facebook, I am mostly interested on facebook.

I couldn't even get the certificate out of it to trust it or didn't find the CA Cert to add.

Would this store private access/comments with facebook? what about user interaction?

nlevitt commented 5 years ago

By default warcprox saves the CA cert in ./{hostname}-warcprox-ca.pem, or loads it from there if the file already exists. You can specify a different file with the --cacert option. You could try adding that as a trusted CA in your browser.

Another option is to run chrome with --ignore-certificate-errors.

All your browser traffic will be archived, including anything private. If you log in to facebook while browsing with warcprox, your password will be stored in the warc as well, so that's something to be aware of. Any traffic that initiated as a result of user interaction will be archived.

laurelin88 commented 3 years ago

All your browser traffic will be archived, including anything private. If you log in to facebook while browsing with warcprox, your password will be stored in the warc as well, so that's something to be aware of. Any traffic that initiated as a result of user interaction will be archived.

Just came across this comment while reading up on brozzler and warcprox - and I wanted to ask, given that warcprox is used in brozzler, does that mean that capturing pages while logged-in with brozzler also stores the credentials in the WARC? If yes, is there a way to locate it inside the WARC?