lightbody / browsermob-proxy

A free utility to help web developers watch and manipulate network traffic from their AJAX applications.
http://bmp.lightbody.net
Apache License 2.0
2.17k stars 656 forks source link

BrowserMob-Proxy architecture and future #144

Closed jekh closed 8 years ago

jekh commented 9 years ago

This has been discussed at various times, including in PR #17. Currently BMP is a pretty tightly wound monolith that bundles together the command-line REST services layer, plus the core BMP functionality (whitelisting/blacklisting, HAR capture, etc.), plus the actual proxying code itself (with Jetty 5, MITM socket code, etc.). This means that the entire REST services layer, with sitebricks and associated classes, are included as dependencies, even when only using browsermob-proxy in embedded mode. It also makes it slightly harder to implement a different BMP core, which I'm currently doing with littleproxy.

At the very least, it would be nice to have two separate modules:

Both of these modules would be separate jars and separate artifacts in Central.

Any thoughts on this? I'd be willing to take a crack at it, but I'm unsure of how a multi-module maven project should look, and how it would work with the maven release plugin. Any help/advice would be greatly appreciated.

lightbody commented 9 years ago

Hey we should talk about this and a number of other pull requests. Can you hit me up via email/Google Talk (patrick@lightbody.net)? I'm in the middle of writing up a longer explanation and plan, but I think the best thing for the project and community is:

1) Give you, @andyclark and @nite23 commit rights so that you don't need to to wait for me to get around to approving PRs.

2) Initiate a plan to wind down the BMP project.

3) Introduce a brand new project designed to replace BMP with a much more modern codebase.

I'll be writing up a blog post soon (today I hope) expanding in more detail my personal reasons, my thoughts on what the future technical direction looks like, and my commitment level to help with the transition and creation of a new project.

lightbody commented 9 years ago

I agree with this direction :)

http://blog.lightbody.net/post/108483852039/the-future-of-bmp

jekh commented 9 years ago

Thank you, Patrick. I'm excited to help push BMP into a new era. You've done a fantastic job maintaining this project, and based on the activity on github, Google Groups, SE, and elsewhere, you have contributed enormously to thousands of people's software development efforts. Thank you!

Here are my thoughts on some of the major next steps for the project. I'd love to get some feedback from @nite23, @andyclark, and the rest of the community on the direction they'd like to see BMP go. In no particular order, I'd like to work toward:

  1. A Maven Central release for recent bugfixes/enhancements
    • Complete the upgrade to Jackson 2 and the latest sitebricks (PR #145) under the existing architecture.
    • Have one final release to Maven Central with the current architecture, beta-10.
  2. Implementation of the new architecture
    • Come up with a suitable interface to describe the proxy capabilities, which will serve as the contract between the REST API layer and the embedded proxy core. Have both the Jetty and LittleProxy-based code implement this inferface. (I've done this to some extent already, but it needs significant refinement.)
    • Split the proxy into separate REST API and embedded/core modules, published as two separate jars.
    • Create a littleproxy-integration branch to work on the littleproxy integration. I've already implemented most of BMP's features in my fork, which can serve as the base of the littleproxy integration. There's still work to be done, though.
    • When the LittleProxy integration is advanced enough to be useful, merge it into the core module and add it as an optional dependency, controlled by a switch or JVM property.
  3. Enhance the REST API
    • Expose the new BMP functionality through the REST API.
    • (Should sitebricks be replaced with something else?)
  4. Unit test improvements -- I'd also like to put some effort into shoring up unit test coverage. In particular:
    • Add unit tests as needed to properly cover BMP features.
    • Reduce the number of unit tests that require firing up a Firefox instance to hit an external website. Avoid using a browser unless absolutely necessary, and keep as much test data 'internal' and version-controlled as possible. (I'm responsible for a large number of unit tests that use Firefox, in particular the HAR tests. It's the easy way out but it tends to break unit tests when external websites are unavailable or change their content.)
    • Use Travis CI as our continuous integration server.

Throughout the above, I think it makes sense to keep the existing Jetty implementation as the default, with the LittleProxy implementation as an optional "opt-in" implementation. Once we feel the LittleProxy version is solid enough, we can enable that by default and turn off the Jetty implementation. That will significantly reduce the amount of code we have (ditching all of Jetty 5, for example), as well as a number of runtime dependencies. Perhaps that final switch to LittleProxy could coincide with the first 2.0.0.Final release.

Another area I'd like to see development in is support for other languages, since many non-Java folks seem to rely on it. I only use BMP in Java and 'standalone' modes, so I'm not really sure what the Python/.NET/other integrations could really use.

I think we should be able to do the beta-10 release to Central anytime, unless anybody knows of any blocking defects we should resolve first. I'll open up a separate issue/PR for the interface contract shortly, to get the ball rolling on that.

lightbody commented 9 years ago

I think sounds like a great plan.

(Should sitebricks be replaced with something else?)

Yes! I recommend something like DropWizard. It's what I had hoped SIteBricks would have turned in to :)

I think we should be able to do the beta-10 release to Central anytime

Would you prefer me to do this quickly now or would it be better to use it as an opportunity to teach you the release process? It's actually quite easy once you have all the permissions set up (which the Maven folks have to do by hand the first time) and you have the SSH and gpg signing bits configure.

nite23 commented 9 years ago

Patrick, thanks for all the work you've done on the project so far. I believe it has great potential, and I'm glad to be a part of it.

Here are some of my thoughts on the project direction.

Come up with a suitable interface to describe the proxy capabilities, which will serve as the contract between the REST API layer and the embedded proxy core. Have both the Jetty and LittleProxy-based code implement this inferface. .. Merge it into the core module and add it as an optional dependency, controlled by a switch

I'm not sure if it's worth it to add a layer of abstraction to support pluggable proxy cores, if we plan to keep only the LittleProxy core in the future. Also, I think it would be good to do at least one stable release of BrowserMob, before we do any major changes to the architecture. So, I propose to create a branch for BrowserMob 3.0, and within it completely drop the Jetty 5.x functionality and replace with LittleProxy. In parallel we'd continue developing the 2.0 branch, with the current proxy core. All the work in 2.0 would then be focused on fixing bugs, ironing out the kinks, and in general making the project ready the final release. I'm willing to work on this; a stable release is a priority for my project. 3.0 could be released as soon as it achieves a feature parity with 2.0. If it works well enough, we can abandon the 2.0 branch altogether.

(Should sitebricks be replaced with something else?)

I think we should keep the REST component as lightweight as possible. Something like embedded Jetty + RESTEasy?

jekh commented 9 years ago

Sorry for the verbosity... This post turned out to be a lot longer than I anticipated. I'll split it up into separate issues if we need to expand the discussion of any particular section.

Core interface

I'm not sure if it's worth it to add a layer of abstraction to support pluggable proxy cores, if we plan to keep only the LittleProxy core in the future.

I completely agree that supporting pluggable proxy backends is neither feasible nor a goal of BMP. The idea behind the interface isn't really to allow pluggable cores, but rather:

  1. Reduce the development effort in the transition from bundled "REST+core" to separate modules. The interface would be the dividing line between REST and core, which will force better architecture on the REST module.
  2. Deprecate features that can't or won't be supported in a LittleProxy-based implementation, such as the StreamManager bandwidth control methods. (Instead of StreamManager, we can define a generic throttling interface that both LittleProxy and Jetty-proxy can support). The interface is our public commitment to support that functionality in future releases.
  3. Ease the transition from the Jetty-based proxy to a LittleProxy-based version. New users can code to the interface rather than the Jetty-bound ProxyServer method without jeopardizing future migration to LittleProxy. Existing users can modify their code to meet the interface's definitions to prepare for the transition while still using the stable Jetty implementation.

Also, I think it would be good to do at least one stable release of BrowserMob, before we do any major changes to the architecture. So, I propose to create a branch for BrowserMob 3.0, and within it completely drop the Jetty 5.x functionality and replace with LittleProxy. In parallel we'd continue developing the 2.0 branch, with the current proxy core. All the work in 2.0 would then be focused on fixing bugs, ironing out the kinks, and in general making the project ready the final release. I'm willing to work on this; a stable release is a priority for my project.

I think you're right, having a bugfix branch for the current arch makes sense, since some defects will probably come up that we want to fix before migrating to LP. I like the idea of 2.x releases supporting the Jetty backend, with later 2.x releases supporting optional LP, but 3.0 being exclusively LP. I do want to make clear to users that the Jetty-based module is strictly short-term only, in other words, "EOL'd" and basically feature-frozen. I think work on new features should happen in the LP implementation, and we won't commit to backporting any new features to the Jetty-based proxy.

Module structure

As I mentioned before, once a LP implementation is "beta-ready", I'd like it to be available even in the 2.x branch, so that people can "flip a switch" to use LP and test their use cases with the LP implementation. Since we're going to split up REST and core, we could actually have one Jetty module and one LP module, and make the LP module optional in later 2.x releases. That would let clients opt-in to LP by explicitly declaring it as a dependency. Something like:

With a project directory structure something like:

I'm open to other ideas though... Whatever the approach, the goal should be to make it as easy as possible for clients to test the LP implementation and make the migration to 3.x minimally intrusive.

REST library

I think we should keep the REST component as lightweight as possible. Something like embedded Jetty + RESTEasy?

The simpler the better. I use RESTEasy regularly and have been happy with it. Dropwzard also uses embedded Jetty, plus (I think) Jersey and Jackson. We need an example of what the REST layer would like like under the proposed library.

Roadmap

So, where does this leave us on the 2.0.0.Final release? Based on @nite23's suggestions, I'd like to propose the following:

  1. Release beta-10 immediately, as the 2.0.0 Release Candidate.
  2. Make a 2.0.0 branch right now, reserved only for bugfixes for the 2.0.0.Final and future 2.0.x releases. The major architectural undertakings discussed here will not be backported to the 2.0.x branch.
  3. Master becomes 2.1.0-beta-1, with the key goals of the 2.1 release being (1) separate rest and core modules; and (2) optional use of (beta) LP.

In the coming weeks, I'd like to define a roadmap for the 2.x releases and the transition to 3.x.

Side-note: Why LittleProxy?

There are some very good reasons I believe we should focus on LittleProxy and keep supporting the Jetty 5-based version only until LP is ready. The reasons are both practical and technical.

From a practical standpoint, we don't want to fragment BMP. Development will be slower and support will be harder when there are two active implementations.

On technical merits, LittleProxy has some killer advantages that are going to make BMP better for just about everyone. Some examples I've found based on my work so far:

nite23 commented 9 years ago

We need an example of what the REST layer would like like under the proposed library.

Here: https://github.com/nite23/browsermob-proxy/commit/474005499d535b2e3a9fe81dba1bc5fe331a4334 I tried to replace the current implementation with only minimal changes to existing code. Let me know what you think.

nite23 commented 9 years ago

once a LP implementation is "beta-ready", I'd like it to be available even in the 2.x branch, so that people can "flip a switch" to use LP and test their use cases with the LP implementation.

I agree that this would be useful, but what if the act of "flipping a switch" was done by changing the version of the artifact in pom.xml? As in:

I think the advantage would be that we'd have simpler project structure and cleaner code, but the disadvantage would be that we'd have to freeze the interface between rest and core early on - at least the methods that are common between 2.x and 3.x branches

jekh commented 9 years ago

Here: nite23@4740054 I tried to replace the current implementation with only minimal changes to existing code. Let me know what you think.

This looks really good. Have you been using this in a local build? Once we get 2.0.0-beta10 into Central and off in its own branch, I think it would be great to get this ready to merge.

I agree that this would be useful, but what if the act of "flipping a switch" was done by changing the version of the artifact in pom.xml? As in:

browsermob-rest (2.1.x)
    browsermob-core 2.1.x (Jetty-based)
    browsermob-core 3.x (LP-based)

browsermob-rest (3.x)
    browsermob-core 3.x (LP-based)

I would actually prefer that. I can't figure out the logistics of it, partially because of the maven release plugin's versioning scheme, and partially because of the shared code between the Jetty and LP implementations. I'm not a maven release expert (major understatement!), but my impression is that maven does not like multiple versions of the same (sub)module within the same project. We might be able to remedy that by moving to gradle, but I'm not sure how folks would feel about that.

On top of that, having both 2.1.x and 3.x versions of browsermob-core will make it almost impossible to create a -bin.zip release that allows standalone users to test against a beta LP implementation (something like ./browsermob-proxy --use-littleproxy). They would have to manually modify the POM and rebuild.

To complicate matters further, there will still be a lot of shared code between LP and Jetty implementations, in utility classes, shared objects (e.g. Har classes, Black/Whitelist, etc.), and of course the interface. If we keep the shared classes in browsermob-core, then make browsermob-core-littleproxy depend on browsermob-core in the 2.1.x version, we wouldn't need to duplicate code. That would avoid having to habitually cherry-pick changesets between the 2.1.x and 3.x branches. It would also ease the transition from 2.1.x to 3.0.x, because there would be less variation in the core classes between the versions (e.g. for people who use the Har classes directly).

The problem might just be my lack of imagination though. What's your thinking on the logistics of developing 2.1.x and 3.x in parallel?

jekh commented 8 years ago

Now that we're solidly established with LittleProxy and on our way to a new REST API, I'll close this issue. Thanks everyone!