Open richard1 opened 6 years ago
Hi @richard1 I like the idea of getting content, it needs to be done in Browsertime. Now we can multiple accessed with the devtools protocol but I think (if I remember correctly) that we need to upgrade to Selenium 4 to get it in our NodeJS Selenium.
What's your use case @richard1 ? For me I want the plain HTML and don't care about the rest of the content, so maybe making it configurable what to get/keep?
Best Peter
Hi @soulgalore - thanks for the quick response!
Yes, my use case is mainly the plain HTML as well. I'd ideally also like to be able to have external Javascript and JSON resources visible as well, so it'd be great to make it configurable on what responses to keep data for.
Ok, cool. For Firefox we have either all or none, but let me change that next week in Browsertime and then aim to do the same for Chrome when we get the opportunity.
@soulgalore - Would love to see this feature as well. Is it still on the roadmap?
@joshuabuildsthings well I still think it would be great but we haven't jacked in the functionality in Browsertime yet. If I understand correctly we will not be able to do it standalone for the Chrome-HAR, we will need to get it through the Chrome API, we need to create an issue in Browsertime (I haven't done that yet).
Getting the response.content.text
optionally would be very helpful. Is there any progress here?
@rvbyron (still) waiting on Selenium 4 to be released as stable to be able to get the info from Chrome using Browsertime (using sendDevToolsCommand to the driver).
This would be extremely useful. However it is already possible to capture these events when you use puppeteer (my current workflow). What can be done to bring implementation forward? Should somebody start a PR?
As far as I can tell one would need to include the response data here, https://github.com/sitespeedio/chrome-har/blob/master/index.js#L462 Should I open a PR to get it rolling?
@Fohlen yes please do! Also, make a test case with an attached trace file. We also need to add a property so we switch on/off the functionality since this can make the HAR file huge.
@soulgalore I actually realised that response.body
is not part of the actual Chrome DevTools Protocol specification (see https://chromedevtools.github.io/devtools-protocol/tot/Network#type-Response). However it can be accessed via the API (https://chromedevtools.github.io/devtools-protocol/1-3/Network#method-getResponseBody). What I will do now is add a response.body
property that arbitrarily maps towards the entry.response.content.text
in the HAR. Is that OK?
Hi @Fohlen sorry, I missed answering. So you add a mapping and then in puppeteer you will add those fields so if it's there, we will use it? Yep works fine, just make you add a CLI parameter so getting the body is turned off by default (keeping old behavior), ok?
If you want to get fancy, returning text
based on a comma separated list of mimeType
would allow us to retrieve text
for html, css, json and javascript files, while eliminating jpeg and png files. However, I'll be ecstatic to get text
regardless and I can trim the binary converted to text data I don't need.
Being able to choose mimeType would be great. Wouldn't necessary want to totally exclude images though, as I could also see situations where getting the binary data would be useful; for example, auto minification pipelines.
@joshuabuildsthings I'm not in any way suggesting to always exclude binary data, I was saying an include
parameter containing a list of mimeType
s would allow more finite control to save bandwidth/memory. It would default to all mimeType
s if you don't specify. An exclude
parameter could make things easy too. Then to top it all off, those could be regular expressions in case you want to include
/exclude
all images (e.g. exclude='image/.*'
).
As for binary data, (you might already be aware) if you go to the debug->network tab and save a har file with content there, you will see that it indeed saves all content including binary data to the text field. So, there is precedent there.
@rvbyron - Sorry for confusion. What you're proposing sounds ideal all around.
@soulgalore So, it looks like there was talk above about how to implement this feature, has it been implemented? If so, can you give an example of how to access it? It would be a very handy feature in certain cases.
Hey @rvbyron @Fohlen said he maybe could implement it. For me to implement it: still waiting on Selenium 4 see https://github.com/sitespeedio/chrome-har/issues/8#issuecomment-430381519
hey @soulgalore sorry for not checking back with you in a while. So implementing is quite trivial, but our stack actually moved away from HAR
and thus I don't have the time and urge to implement and maintain these changes. @rvbyron if you want to implement you can have a look at my puppeteer code which should give you a fair sample on how it works.
Aren't #41 and #42 fixing it?
Maybe?, I've not used it with puppeteer. I've added support in Browsertime from 5.0 but then add it to the HAR from the outside, using CDP to get the content.
entry.response.content.text
is missing, so the HAR is missing the response body.It looks like the Chrome DevTools Protocol allows querying for the response body by
requestId
, so this could be an approach.