Include content text in response

richard1 commented 6 years ago

entry.response.content.text is missing, so the HAR is missing the response body.

It looks like the Chrome DevTools Protocol allows querying for the response body by requestId, so this could be an approach.

soulgalore commented 6 years ago

Hi @richard1 I like the idea of getting content, it needs to be done in Browsertime. Now we can multiple accessed with the devtools protocol but I think (if I remember correctly) that we need to upgrade to Selenium 4 to get it in our NodeJS Selenium.

What's your use case @richard1 ? For me I want the plain HTML and don't care about the rest of the content, so maybe making it configurable what to get/keep?

Best Peter

richard1 commented 6 years ago

Hi @soulgalore - thanks for the quick response!

Yes, my use case is mainly the plain HTML as well. I'd ideally also like to be able to have external Javascript and JSON resources visible as well, so it'd be great to make it configurable on what responses to keep data for.

soulgalore commented 6 years ago

Ok, cool. For Firefox we have either all or none, but let me change that next week in Browsertime and then aim to do the same for Chrome when we get the opportunity.

joshuabuildsthings commented 6 years ago

@soulgalore - Would love to see this feature as well. Is it still on the roadmap?

soulgalore commented 6 years ago

@joshuabuildsthings well I still think it would be great but we haven't jacked in the functionality in Browsertime yet. If I understand correctly we will not be able to do it standalone for the Chrome-HAR, we will need to get it through the Chrome API, we need to create an issue in Browsertime (I haven't done that yet).

rvbyron commented 6 years ago

Getting the response.content.text optionally would be very helpful. Is there any progress here?

soulgalore commented 6 years ago

@rvbyron (still) waiting on Selenium 4 to be released as stable to be able to get the info from Chrome using Browsertime (using sendDevToolsCommand to the driver).

Fohlen commented 6 years ago

This would be extremely useful. However it is already possible to capture these events when you use puppeteer (my current workflow). What can be done to bring implementation forward? Should somebody start a PR?

Fohlen commented 6 years ago

As far as I can tell one would need to include the response data here, https://github.com/sitespeedio/chrome-har/blob/master/index.js#L462 Should I open a PR to get it rolling?

soulgalore commented 6 years ago

@Fohlen yes please do! Also, make a test case with an attached trace file. We also need to add a property so we switch on/off the functionality since this can make the HAR file huge.

Fohlen commented 6 years ago

@soulgalore I actually realised that response.body is not part of the actual Chrome DevTools Protocol specification (see https://chromedevtools.github.io/devtools-protocol/tot/Network#type-Response). However it can be accessed via the API (https://chromedevtools.github.io/devtools-protocol/1-3/Network#method-getResponseBody). What I will do now is add a response.body property that arbitrarily maps towards the entry.response.content.text in the HAR. Is that OK?

soulgalore commented 6 years ago

Hi @Fohlen sorry, I missed answering. So you add a mapping and then in puppeteer you will add those fields so if it's there, we will use it? Yep works fine, just make you add a CLI parameter so getting the body is turned off by default (keeping old behavior), ok?

rvbyron commented 6 years ago

If you want to get fancy, returning text based on a comma separated list of mimeType would allow us to retrieve text for html, css, json and javascript files, while eliminating jpeg and png files. However, I'll be ecstatic to get text regardless and I can trim the binary converted to text data I don't need.

joshuabuildsthings commented 6 years ago

Being able to choose mimeType would be great. Wouldn't necessary want to totally exclude images though, as I could also see situations where getting the binary data would be useful; for example, auto minification pipelines.

rvbyron commented 6 years ago

@joshuabuildsthings I'm not in any way suggesting to always exclude binary data, I was saying an include parameter containing a list of mimeTypes would allow more finite control to save bandwidth/memory. It would default to all mimeTypes if you don't specify. An exclude parameter could make things easy too. Then to top it all off, those could be regular expressions in case you want to include/exclude all images (e.g. exclude='image/.*').

As for binary data, (you might already be aware) if you go to the debug->network tab and save a har file with content there, you will see that it indeed saves all content including binary data to the text field. So, there is precedent there.

joshuabuildsthings commented 6 years ago

@rvbyron - Sorry for confusion. What you're proposing sounds ideal all around.

rvbyron commented 6 years ago

@soulgalore So, it looks like there was talk above about how to implement this feature, has it been implemented? If so, can you give an example of how to access it? It would be a very handy feature in certain cases.

soulgalore commented 6 years ago

Hey @rvbyron @Fohlen said he maybe could implement it. For me to implement it: still waiting on Selenium 4 see https://github.com/sitespeedio/chrome-har/issues/8#issuecomment-430381519

Fohlen commented 5 years ago

hey @soulgalore sorry for not checking back with you in a while. So implementing is quite trivial, but our stack actually moved away from HAR and thus I don't have the time and urge to implement and maintain these changes. @rvbyron if you want to implement you can have a look at my puppeteer code which should give you a fair sample on how it works.

AgainPsychoX commented 5 years ago

Aren't #41 and #42 fixing it?

soulgalore commented 5 years ago

Maybe?, I've not used it with puppeteer. I've added support in Browsertime from 5.0 but then add it to the HAR from the outside, using CDP to get the content.

sitespeedio / chrome-har

Include content text in response #8