webmetrics / browsermob-proxy

NOTICE: this project has been forked and is being maintained at https://github.com/lightbody/browsermob-proxy
https://github.com/lightbody/browsermob-proxy
Apache License 2.0
234 stars 773 forks source link

[BrowserMobHttpClient.java]Capture Content in beta 8 #85

Open d-jubeau opened 11 years ago

d-jubeau commented 11 years ago

I experience some troubles since some of my produced HAR files are bigger than 10MB. After browsing the code, I'd want to have your attention on this piece of code :

BrowserMobHttpClient.java; arround line 736 (in beta 8, not released) :

if (contentType != null && contentType.startsWith("text/")) {
        entry.getResponse().getContent().setText(new String(copy.toByteArray()));
} else { 
        entry.getResponse().getContent().setText(Base64.byteArrayToBase64(copy.toByteArray()));
}

I think there are 2 issues here :

I have these needs in a student project, I'm probably going to do the changes, but I would be glad to have your opinions, and to know if publishing the changes could help someone.

lightbody commented 11 years ago

I agree that we should not Base64 encode application/javascript. If you can submit a pull request that supports additional "plain text" content types I would be glad to accept them.

You're right that HAR files can get VERY large when you start capturing the content of every request. I'm open to ideas on how to limit that, such as configuration for limiting the size of each body or limited the capturing of the content only to certain URLs or file types.

d-jubeau commented 11 years ago

I did some changes : https://github.com/d-jubeau/browsermob-proxy/commit/93f430ab7bb5b539626c243eee46d27cd23d30b6 We are currently doing some tests on several hundreds of websites, it seems to work well up to now

roydekleijn commented 11 years ago

Hi Patrick,

Capturing body content for particular URL's would be really nice. Maybe we can add a regex parameter to the related method.