internetarchive / warcprox

WARC writing MITM HTTP/S proxy
379 stars 55 forks source link

Seperate WARC file for each request #179

Open Yakabuff opened 1 year ago

Yakabuff commented 1 year ago

Hi, Is there a way to create a separate WARC file for each request?

eg: I have 2 browsers, both using warcprox as a proxy Browser 1 sends a request to google.com. Browser 2 sends a request to wikipedia.com At the moment, I get a single warc file with data from both sites. Is there a way to save the response in 2 separate WARC files? eg: warcprox-wikipedia.warc, warcprox-google.warc

Thanks

CC: @nlevitt

anjackson commented 1 year ago

This can done by adding a Warcprox-Meta header to all requests, defining a warc-prefix.

But note that it can sometimes be difficult to guarantee that all requests will include the header, including those from in-browser service workers. See https://github.com/ukwa/webrender-puppeteer/issues/6