internetarchive / warcprox

WARC writing MITM HTTP/S proxy
382 stars 54 forks source link

discussion: collaboration, 2.x, etc #19

Open nlevitt opened 8 years ago

nlevitt commented 8 years ago

On #18 @trifle wrote

That said, I think the fact that almost everyone seems to need to fork warcproxy for their project is a sign that it might benefit from integrating changes back into the original project - at least that's what I'd love to see.

And @justinlittman wrote

Agree. I recently noticed that @nlevitt has a mess of changes underway in #17. @nlevitt -- care to comment on the roadmap for 2?

I'm opening this issue as a place to discuss these questions and anything relating to them.

nlevitt commented 8 years ago

I agree that it would be great to integrate more of people's improvements into warcprox. I'm actually not very aware of what people are doing with it. I know @justinlittman had a pull request open for a while. I looked at it some but never solidified my thoughts on it, sorry about that (and I guess now you're going in a different direction). I am very interested to find out more about peoplest local customizations.

trifle commented 8 years ago

Thanks, @nlevitt!

I started #18 because I've used code from several projects that use warcprox. Many people seem to go for a deep integration which results in low modularity and generally makes everything hard to re-use.

Ultimately the nicest thing to have would be a library that encapsulates warcprox and selenium and provides a clean API for capturing WARCs, screenshots and HTML from browser-rendered pages. I'd love to make one, but definitely lack the time to do so right now (PhD thesis...).

The low-hanging fruit would be to add some features to warcprox that enable such use cases:

I guess we can collect other ideas for enhancements that increase warcprox' utility in similar ways.

nlevitt commented 8 years ago

I'm developing 2.x in concert with brozzler (https://github.com/internetarchive/brozzler). Lots of new features and bug fixes in there. I'm not sure when 2.x will be promoted to master. I would be very happy if people want to use 2.x, at this early stage, and I'm interested in people's thoughts about it.

nlevitt commented 8 years ago

@trifle Your idea for a "library that encapsulates warcprox and selenium..." sounds sort of like brozzler+warcprox, but not exactly. I have plans to create a project that pulls in brozzler and warcprox together, for two main purposes: quick start for users to get a little brozzler "cluster" up, and as a harness for integration tests. I wonder if that could meet your needs.