harvard-lil / scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
MIT License
117 stars 8 forks source link

Capture SSL certs from HTTP Proxy directly #138

Open matteocargnelutti opened 1 year ago

matteocargnelutti commented 1 year ago

The current implementation of SSL certs capture:

While the current setup is extremely convenient, it would be preferable - both conceptually and from a performance perspective - to pull the certificates directly from the proxy, especially since Portal gives us that flexibility.


Ideally the replacement is somewhat "drop in":

captureCertificatesAsAttachmentTimeout and crip dependencies would be removed.


Progress:

https://github.com/harvard-lil/scoop/pull/140

leppert commented 1 year ago

Here's the API: https://nodejs.org/api/tls.html#tlssocketgetpeercertificatedetailed

We'll end up with something in ScoopProxy, probably in this block, along the lines of

  onConnected (serverSocket, request) {
    const exchange = this.exchanges.find(ex => ex.requestParsed === request)
    const ip = serverSocket.remoteAddress
    const rule = this.findMatchingBlocklistRule(ip)
    if (rule) {
      serverSocket.destroy()
      this.blockRequest(request, ip, rule)
    } else if (exchange) {
      exchange.cert = response.socket.getPeerCertificate(true)
    }
}
matteocargnelutti commented 1 year ago

Update: Getting close, but stuck on hard-to-troubleshoot edge cases (https://github.com/harvard-lil/scoop/pull/140#issuecomment-1503646180).

As this feature is not broken at the moment: