Closed Mitch-Martinez closed 2 years ago
The code that you edited handles both text and binary messages, so it cannot be changed to bytearray as that breaks text.
Ah yeah that’s a huge oversight on my part, I’ll close the PR when I get a chance and see if it can be fixed another way.
On Fri, Jun 24, 2022 at 6:43 PM Miguel Grinberg @.***> wrote:
The code that you edited handles both text and binary messages, so it cannot be changed to bytearray as that breaks text.
— Reply to this email directly, view it on GitHub https://github.com/miguelgrinberg/simple-websocket/pull/15#issuecomment-1166074031, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIKKMJHH6XOIRWWQJWDDBLDVQZBZZANCNFSM5ZZCMYEA . You are receiving this because you authored the thread.Message ID: @.***>
Made an update to differentiate between text and binary data.
I opted to split checking for ByteMessage
and TextMessage
to avoid always having to call isinstance()
twice.
Would returning bytearray
vs bytes
be an issue for anything using this module? I would assume anybody calling receive would appreciate that the returned data is now mutable, but I could definitely be missing something.
Update to fix flake8 errors, I separated test_recieve_large
in the server test so it should now get coverage of the entire diff. I appreciate your patience with all this.
Merging #15 (21045e3) into main (9e024fd) will increase coverage by
0.17%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #15 +/- ##
==========================================
+ Coverage 97.66% 97.83% +0.17%
==========================================
Files 2 2
Lines 171 185 +14
Branches 35 39 +4
==========================================
+ Hits 167 181 +14
Misses 3 3
Partials 1 1
Impacted Files | Coverage Δ | |
---|---|---|
src/simple_websocket/ws.py | 97.82% <100.00%> (+0.17%) |
:arrow_up: |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
Would returning bytearray vs bytes be an issue for anything using this module?
I'm not sure. I need to find out if bytearray(some_bytes)
and bytes(some_bytearray)
are cheap operations, because in that case the array can be converted to bytes. If no data is copied in these conversions, then I'd rather have the payload be delivered as bytes to keep things compatible with the current version.
Data is copied in these conversions, doing the following shows it has a different id, which should mean that it's in a different place in memory.
>>> a = b'somebytedata'
>>> b = bytearray(a)
>>> id(a)
140171735861408
>>> id(b)
140171735743728
So after knowing that a copy is taking place I tested using timeit
running timeit on the bytearry(some_bytes)
operation takes a few hundred nanoseconds. I used 4096 because that should be the default max message size.
>>> timeit.timeit("b = bytearray(a)", setup="a=b'0'*4096", number=100000)/100000
3.420370000003459e-07
Would this be expensive for clients/servers that are sending a ton of small packets at a given time?
I've been focusing on doing large transfers of data and not paying too much attention to the efficiency of small packets.
Just out of curiosity I did a similar test with 1MB of data and saw that it takes about a 1/10th of a millisecond. Doing this type of coversion only once is a very small price compared to the many copy operations being done currently, but that can only be said for large messages at the moment.
>>> timeit.timeit("b = bytearray(a)", setup="a=b'0'*1048576", number=100000)/100000
8.481695999999829e-05
So without any further testing would it make sense to generalize that this change would add about 350 nanoseconds per message but it would start to save more time if messages get much larger (really in the MB range is when it is visually noticable on the command line). But when messages do get past 10MB the receiver slows down significantly when it should only take a few tens to hundreds of milliseconds to transfer depending on network speed.
I'll probably take some time to test out the time differences with and without the changes to be a little more thorough. It might make sense to only do these conversions if the incoming message has to be reassembled.
Thanks, this was a useful discussion. I ended up implementing a slightly different solution that switches both text and byte messages to bytearray for faster appending, but only on arrival of the second part. So single part messages, which are the most common, do not suffer any performance penalties. Multipart messages are then converted back to str or bytes once the final part is received.
Could you please give the main branch a try, and let me know if this is satisfactory for your use case?
Sorry for the late reply, just got back from a short vacation.
These changes do work for me when using simple-websockets. How can these changes be reflected in the flask-sock package? This was how I was originally turned onto simple-websockets.
@Mitch-Martinez This fix is now release. Upgrade simple-websocket
and you'll have access to this from Flask-Sock.
Using += on
bytes
is inefficient becausebytes
is immutable. Makingincoming_message
abytearray
allows += to efficiently reallocate memory forincoming_message
when incoming messages are large.It is not the usual use case for websockets but when a client attempts to send a 50MB file the += operation on a
bytes
object begins to become extremely expensive. After modifying theimcoming_message
object it greatly improved the performance of reading large messages, mainly ones greater than 10MB.