Closed lapp0 closed 3 years ago
The recent change to support websockets allows Selenium Wire to pass through websocket data being transferred back and forth, but Selenium Wire doesn't yet capture this data. Prior to the change, websocket connections wouldn't work at all and page certain page functionality would break.
We need to look at what would be involved in capturing the websocket frames and exposing them via a suitable driver attribute.
I looked a bit into it. The fact that the responses are pickled presents some difficulty here. Is there any reason we need to pickle responses and save? Can't we just keep the objects in memory?
The idea is that all request and response data is immediately pickled to disk on capture, and Selenium Wire keeps just an index of this data in memory. It does this largely for scalability reasons as the volume of captured data could be very large, particularly if capture occurs over an extended time period.
We could look at some redesign here however if it would aid websocket data capture. Did you have any ideas on an approach?
Because websockets can continue to receive messages after the request is "complete", I think there are three things we might do here:
1) simplest working solution: offer an "in memory mode" with no disk writing, and document that websocket messages can only be read if this mode is enabled, otherwise they'll be an empty list
Would love to hear what you think.
Many thanks for the proposals. I wonder whether we try solution 1) first, as that sounds like the easiest to get off the ground.
I guess we could create a InMemoryRequestStorage
class which could be swapped with the current RequestStorage
depending upon whether memory mode is enabled or not?
I'm away currently but can look at it when I get back. Or if you have time to have a play yourself and make a PR feel free.
I guess we could create a InMemoryRequestStorage class
Funny enough, that's exactly what I named the class in the linked WIP PR.
I have changes in PR, however I'm struggling with a bug regarding Request._body
not being set. I've tried figuring out what's going wrong, but I couldn't. Please let me know if you have any ideas:
https://github.com/wkeeling/selenium-wire/pull/143#issuecomment-664555087
Many thanks @lapp0 for the PR. It will be a few days before I can take a look but I'll try and figure out why request._body is falling over.
hi @wkeeling, please let me know if you'll have some chance to look at it
@lapp0 sorry for the delay on this. I started to look at it and then wondered whether we should use the wsproto library for handling the websocket communication rather than trying to use our own implementation. The wsproto library is also what mitmproxy uses (a backend supported by Selenium Wire) and thus by using wsproto we would keep things consistent.
I'm a little pushed for time currently having recently started a new job, but I haven't forgotten and I will look at this as soon as time permits.
The core of Selenium Wire has been reworked and the old core thrown out, largely to address issues with performance. As a consequence we get much improved websocket handling for free - and websocket capture has been much easier to implement. Many thanks for your ideas and work on this issue initially. The current implementation stores websocket messages in memory and we may look to your original suggestions to improve this over time (e.g. writing the messages to a pickle object as they arrive).
I'm quite busy at the moment, but if I come back to this around the holiday are you open to a MR pickling WS messages and allowing access to Response.messages
based on an optional argument to instantiation of seleniumwire?
Certainly would be open to improving the storage of websocket messages, as currently they're just held in a list in memory. It would be better if they could be persisted and it would ensure they would scale.
In terms of message retrieval, the API for that is actually now in place - they can be retrieved using request.ws_messages
where request
is the originating websocket handshake request (i.e. starts wss://
). The messages themselves are held in chronological order and have a from_client
attribute to denote the direction they were sent.
@wkeeling we haven't got the functionality to send messages to the intercepted websocket connections if I'm not mistaken?
@ankurpandeyvns yes unfortunately it is not currently possible to send data to web socket connections, only capture the data that was sent and received. It sounds as though this is a requirement for you?
@ankurpandeyvns yes unfortunately it is not currently possible to send data to web socket connections, only capture the data that was sent and received. It sounds as though this is a requirement for you?
Yeah it would have been great if the functionality was there. It's not a necessary requirement but would have been great if it was there.
I'm trying to view websocket data using selenium. I'm testing out the recent changes that @wkeeling so kindly applied in https://github.com/wkeeling/selenium-wire/commit/af8247908e69a37d5e3f69a4a3c690e199953754
However, I'm a bit confused. The linked code doesn't appear to add websocket data to the response. The expected behavior is that either
response.body
orresponse.messages
would contain a list of websocket messages.Here is the code I attempted:
And here is me observing a lack of websocket data available:
Is this code a WIP, or is there something more I'm missing