Open nlevitt opened 8 years ago
I agree that it would be great to integrate more of people's improvements into warcprox. I'm actually not very aware of what people are doing with it. I know @justinlittman had a pull request open for a while. I looked at it some but never solidified my thoughts on it, sorry about that (and I guess now you're going in a different direction). I am very interested to find out more about peoplest local customizations.
Thanks, @nlevitt!
I started #18 because I've used code from several projects that use warcprox. Many people seem to go for a deep integration which results in low modularity and generally makes everything hard to re-use.
Ultimately the nicest thing to have would be a library that encapsulates warcprox and selenium and provides a clean API for capturing WARCs, screenshots and HTML from browser-rendered pages. I'd love to make one, but definitely lack the time to do so right now (PhD thesis...).
The low-hanging fruit would be to add some features to warcprox that enable such use cases:
I guess we can collect other ideas for enhancements that increase warcprox' utility in similar ways.
I'm developing 2.x in concert with brozzler (https://github.com/internetarchive/brozzler). Lots of new features and bug fixes in there. I'm not sure when 2.x will be promoted to master. I would be very happy if people want to use 2.x, at this early stage, and I'm interested in people's thoughts about it.
@trifle Your idea for a "library that encapsulates warcprox and selenium..." sounds sort of like brozzler+warcprox, but not exactly. I have plans to create a project that pulls in brozzler and warcprox together, for two main purposes: quick start for users to get a little brozzler "cluster" up, and as a harness for integration tests. I wonder if that could meet your needs.
On #18 @trifle wrote
And @justinlittman wrote
I'm opening this issue as a place to discuss these questions and anything relating to them.