Open wohltat opened 7 months ago
@wohltat your test does not do what you think is doing e.g.
# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
p = k*100
b[p:p] = b'X'*96 + f'{k:3}' + '\n'
mv = memoryview(b)
if you add
print(len(b), b[-100:])
You will see that, 1) it has a length of 20_000 and 2) there are invalid UTF bytes.
Also here
...
await ws.send_str(b[0:100*n].decode())
n += 1
...
this is sending an infinite growing string, so I'm not sure what you want to test with this...
anyway here is a proper test (working on unix and esp32 ports)
import aiohttp
import asyncio
import sys
import time
# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
p = k * 100
b[p : p + 100] = b"X" * 96 + f"{k:3}" + "\n"
mv = memoryview(b)
# print(len(b), b[-100:])
try:
size = int(sys.argv.pop(-1))
except Exception:
size = 2000
# sys.exit()
URL = "wss://<URL>:8448/echo"
sslctx = False
if URL.startswith("wss:"):
try:
import ssl
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
sslctx.verify_mode = ssl.CERT_NONE
except Exception:
pass
async def ws_receive(ws: ClientWebSocketResponse):
try:
async for msg in ws:
if msg.type != aiohttp.WSMsgType.TEXT:
print("type:", msg.type, repr(msg.data))
except (TypeError, OSError) as e:
print("ws_receive", repr(e))
async def ws_test_echo(session):
n = 0
l = 0
m = 0
ws_receive_task = None
async with session.ws_connect(URL, ssl=sslctx) as ws:
ws_receive_task = asyncio.create_task(ws_receive(ws))
try:
while True:
t0 = time.ticks_ms()
while (n * 100) + size < len(b):
await ws.send_str(b[n * 100 : (n * 100) + size].decode())
dt = time.ticks_diff(time.ticks_ms(), t0) / 1e3
print(
"-------------------",
f"packet # {m} |",
f"total: {l/1000:.1f} KB |",
f"{(size/1000)/dt:.1f} KB/s |",
f"{((size*8)/dt)/1e3:.1f} kbps",
"------------------------",
end="\r",
)
n += 1
m += 1
l += size
t0 = time.ticks_ms()
await asyncio.sleep_ms(100)
n = 0
except KeyboardInterrupt:
pass
finally:
await ws.close()
async def main():
async with aiohttp.ClientSession() as session:
print(f"session test @ packet size: {size/1000:.1f} KB")
await ws_test_echo(session)
def run():
asyncio.run(main())
if __name__ == "__main__":
run()
and server
import ssl
from microdot import Microdot
from microdot.websocket import with_websocket
app = Microdot()
html = """<!DOCTYPE html>
<html>
<head>
<title>Microdot Example Page</title>
<meta charset="UTF-8">
</head>
<body>
<div>
<h1>Microdot Example Page</h1>
<p>Hello from Microdot!</p>
<p><a href="/shutdown">Click to shutdown the server</a></p>
</div>
</body>
</html>
"""
@app.route("/")
async def hello(request):
return html, 200, {"Content-Type": "text/html"}
@app.route("/echo")
@with_websocket
async def echo(request, ws):
t0 = time.ticks_ms()
l = 0
m = 0
while True:
data = await ws.receive()
size = len(data)
l += size
dt = time.ticks_diff(time.ticks_ms(), t0) / 1e3
print(
"-------------------",
f"packet # {m} |",
f"total: {l/1000:.1f} KB |",
f"{(size/1000)/dt:.1f} KB/s |",
f"{((size*8)/dt)/1e3:.1f} kbps",
"------------------------",
end="\r",
)
m += 1
t0 = time.ticks_ms()
# print(data[-4:].replace("\n", ""), end="\r")
await ws.send(data)
@app.route("/shutdown")
async def shutdown(request):
request.app.shutdown()
return "The server is shutting down..."
ssl_cert = "cert.pem"
ssl_key = "key.pem"
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.load_cert_chain(ssl_cert, ssl_key)
app.run(port=8448, debug=True, ssl=sslctx)
client
$ micropython ws_echo_test.py 2000
session test @ packet size: 2.0 KB
------------------- packet # 49 | total: 98.0 KB | 19.4 KB/s | 155.3 kbps -------------------------
server
$ micropython ws_server_test.py
Starting async server on 0.0.0.0:8448...
------------------- packet # 570 | total: 1142.0 KB | 19.2 KB/s | 153.8 kbps ------------------------
Hi, thanks for quick your answer.
Your are right with the size of the bytearray, but the first 10_000 bytes stay the same so i can't see how this influences the test. I tried it again a i still have the same problems.
this is sending an infinite growing string, so I'm not sure what you want to test with this...
This was my intention. I wanted to test with varying and increasing packet sizes. Your test has the same size of 2kb for every packet which was never a problem on the local server and is ocasionally failing on the remote server (that i don't know exactly what it is doing).
If i set the size = 5000
, i can reliably make it fail on the local server right away:
Traceback (most recent call last):
File "<stdin>", line 93, in <module>
File "<stdin>", line 89, in run
File "asyncio/core.py", line 1, in run
File "asyncio/core.py", line 1, in run_until_complete
File "asyncio/core.py", line 1, in run_until_complete
File "<stdin>", line 85, in main
File "<stdin>", line 55, in ws_test_echo
File "/lib/aiohttp/aiohttp_ws.py", line 239, in send_str
File "/lib/aiohttp/aiohttp_ws.py", line 189, in send
File "asyncio/stream.py", line 1, in drain
OSError: -104
With increasing packet sizes the point of failure is almost always at (or very close to) 4k for the local server. Strangely the connection to the remote server fails rather randomly between 2k and 8k. But i don't know what kind of server that is.
The other problem that may be connected is that there is a memory problem, that may be caused by memery fragmentation i guess. I have no other explanation since there should be enough memory available. When i use a little memory in between the packets than the following memory error might occur. Not always though, often the connections breaks because of the before mentioned errors:
gc.mem_free()=82864 gc.mem_alloc()=34192
------------------- 36 ------------------------
gc.mem_free()=82832 gc.mem_alloc()=34224
------------------- 37 ------------------------
gc.mem_free()=82800 gc.mem_alloc()=34256
------------------- 38 ------------------------
Traceback (most recent call last):
File "<stdin>", line 66, in <module>
File "asyncio/core.py", line 1, in run
File "asyncio/core.py", line 1, in run_until_complete
File "asyncio/core.py", line 1, in run_until_complete
File "<stdin>", line 62, in main
File "<stdin>", line 48, in ws_test_echo
File "/lib/aiohttp/aiohttp_ws.py", line 239, in send_str
File "/lib/aiohttp/aiohttp_ws.py", line 184, in send
File "/lib/aiohttp/aiohttp_ws.py", line 137, in _encode_websocket_frame
MemoryError: memory allocation failed, allocating 3809 bytes
I used the following client code on the ESP32 (The server ran on my pc under linux). The list l
is to cause increase memory fragmentation:
import aiohttp
import asyncio
import gc
# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
p = k*100
b[p:p+100] = b'X'*96 + f'{k:3}' + '\n'
mv = memoryview(b)
print(len(b), b[-100:])
l = []
URL = "wss://somewebsocketserver"
sslctx = False
if URL.startswith("wss:"):
try:
import ssl
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
sslctx.verify_mode = ssl.CERT_NONE
except Exception:
pass
async def ws_receive(ws : ClientWebSocketResponse):
try:
async for msg in ws:
if msg.type != aiohttp.WSMsgType.TEXT:
print('type:', msg.type, repr(msg.data))
except TypeError as e:
print('ws_receive', repr(e))
async def ws_test_echo(session):
n = 1
while True:
ws_receive_task = None
async with session.ws_connect(URL, ssl=sslctx) as ws:
ws_receive_task = asyncio.create_task(ws_receive(ws))
try:
while True:
gc.collect()
print(f'{gc.mem_free()=} {gc.mem_alloc()=}')
print('-------------------', n, '------------------------')
await ws.send_str(b[0:100*n].decode())
n += 1
l.append('abcdefghijkl' + str(n))
await asyncio.sleep_ms(300)
except KeyboardInterrupt:
pass
finally:
await ws.close()
async def main():
async with aiohttp.ClientSession() as session:
await ws_test_echo(session)
if __name__ == "__main__":
asyncio.run(main())
Your are right with the size of the bytearray, but the first 10_000 bytes stay the same so i can't see how this influences the test.
Because at some point it was sending non-UTF characters so it looked like data corruption.
This was my intention. I wanted to test with varying and increasing packet sizes. Your test has the same size of 2kb for every packet which was never a problem on the local server and is ocasionally failing on the remote server (that i don't know exactly what it is doing). If i set the size = 5000, i can reliably make it fail on the local server right away:
Ok I see, I'm testing this in unix port at both sides, with increased sizes and at some point it "breaks" .i.e at 8000-10000 for wss
and > 16000 for plain ws
(Continuation frames not supported
). I'm not sure if this has anything to do with the aiohttp_ws
implementation or if the "bug" is at lower level, asyncio streams maybe?
This may have something to do with partial writes 🤔 ... I may need to do further testing with CPython, and also with plain asyncio streams and see if I can reproduce this...
I tried it again with a python-websockets server described in https://github.com/micropython/micropython-lib/issues/853#issue-2283450782.
The program also freezes and i get the following on the server side:
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 24
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 25
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 26
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 27
% sending keepalive ping
> PING 04 4b f2 e3 [binary, 4 bytes]
= connection is CLOSED
! failing connection with code 1006
connection handler failed
Traceback (most recent call last):
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 963, in transfer_data
message = await self.read_message()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1033, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1108, in read_data_frame
frame = await self.read_frame(max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1165, in read_frame
frame = await Frame.read(
^^^^^^^^^^^^^^^^^
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/framing.py", line 68, in read
data = await reader(2)
^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/streams.py", line 750, in readexactly
await self._wait_for_data('readexactly')
File "/usr/lib/python3.11/asyncio/streams.py", line 543, in _wait_for_data
await self._waiter
File "/usr/lib/python3.11/asyncio/selector_events.py", line 974, in _read_ready__get_buffer
nbytes = self._sock.recv_into(buf)
^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/server.py", line 236, in handler
await self.ws_handler(self)
File "/home/username/tls-test/server_secure.py", line 16, in hello
name = await websocket.recv()
^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
await self.ensure_open()
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: no close frame received or sent
connection closed
or also this
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 35
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 36
< CLOSE 1005 (no status received [internal]) [0 bytes]
= connection is CLOSING
> CLOSE 1005 (no status received [internal]) [0 bytes]
x closing TCP connection
= connection is CLOSED
connection handler failed
Traceback (most recent call last):
File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/venv/lib/python3.11/site-packages/websockets/legacy/server.py", line 236, in handler
await self.ws_handler(self)
File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/server_secure.py", line 16, in hello
name = await websocket.recv()
^^^^^^^^^^^^^^^^^^^^^^
File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
await self.ensure_open()
File "/home/boogieman/Documents/projects/rfid/websockets_python/microdot/examples/tls/venv/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 948, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: received 1005 (no status received [internal]); then sent 1005 (no status received [internal])
connection closed
The websockets implementation is based on https://github.com/danni/uwebsockets ,so it may be interesting to test that and see if it breaks too. Using Wireshark may help 🤔, and something like iperf3 for websockets would be nice.... also there is https://github.com/micropython/micropython/issues/12819, and https://github.com/orgs/micropython/discussions/14427#discussioncomment-9337471
[EDIT] this may help too esp.osdebug(True, esp.LOG_INFO)
and I'm seeing
this
I (35117) wifi:bcn_timeout,ap_probe_send_start
I (37617) wifi:ap_probe_send over, resett wifi status to disassoc
I (37617) wifi:state: run -> init (c800)
I (37617) wifi:pm stop, total sleep time: 4746559 us / 30202076 us
I (37627) wifi:<ba-del>idx
I (37627) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (37637) wifi: STA_DISCONNECTED, reason:200:beacon timeout
Traceback (most recent call last):
File "lv_utils.py", line 148, in task_handler
FI (37647) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
ile "lv_mpdiI (37657) wifi:state: init -> auth (b0)
I (37657) wifi:state: auth -> assoc (0)
I (37667) wifi:state: assoc -> run (10)
File "board_config.py", line 146, in blit
File "board_config.py", line 106, in send_data
File "protocol.py", line 185, in recv
File "protocol.py", line 92, in read_frame
OSError: [Errno 113] ECONNABORTED
So it looks like a wifi beacon timeout...
I took a closer look in wireshark and found something i didn't expect. Although i'm only sending from the esp32 to the pc server via websockets, there are there are a lot of TCP packets that i don't know what they are doing.
The TLS packets with increasing size are probably the ones with the payload. As one expect they increase by 100 byte every time. The other packets are: 54 byte -> Client (to 192.168.178.64) 1494 byte -> Server (to 192.168.178.58) 54 byte -> Client 1494 byte -> Server 54 byte -> Client then the next TLS Packet
When the connection breaks there are those TCP packets missing. This causes OSError 113 (= ECONNABORTED).
As i understand it OSError is not from micropython but the Espressif IDF. Is this maybe some kind of connection-alive-checking mechanism that could be turned of? When watching the websockets communication with a client running locally on my pc there are not those additional TCP-packets. I also find them rather large with around 3kb "useless" data transfer for potentially just a single byte of payload.
As i understand it OSError is not from micropython but the Espressif IDF. Is this maybe some kind of connection-alive-checking mechanism that could be turned of?
I've tried to increase the beacon timeout but it only takes longer to throw the error after the socket becomes unresponsive, it seems like the whole network stack stops...I've just tested the solution at https://github.com/micropython/micropython/issues/12819 but it still stops at some point...same with enabling debug/verbose mode, it stops too...well I think this may be important to read it carefully... https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html#wi-fi-buffer-usage
@wohltat my conclusion is that after https://github.com/micropython/micropython/pull/13219 and https://github.com/micropython/micropython/pull/12141 (see e.g. https://github.com/micropython/micropython/pull/12141#issuecomment-1670418514) this has become more relevant
Why Buffer Configuration Is Important In order to get a high-performance system, consider the memory usage/configuration carefully for the following reasons:
- the available memory in ESP32 is limited.
- currently, the default type of buffer in LwIP and Wi-Fi drivers is "dynamic", which means that both the LwIP and Wi-Fi share memory with the application. Programmers should always keep this in mind; otherwise, they will face a memory issue, such as "running out of heap memory".
- it is very dangerous to run out of heap memory, as this will cause ESP32 an "undefined behavior". Thus, enough heap memory should be reserved for the application, so that it never runs out of it. the Wi-Fi throughput heavily depends on memory-related configurations, such as the TCP window size and Wi-Fi RX/TX dynamic buffer number.
- the peak heap memory that the ESP32 LwIP/Wi-Fi may consume depends on a number of factors, such as the maximum TCP/UDP connections that the application may have.
- the total memory that the application requires is also an important factor when considering memory configuration.
Due to these reasons, there is not a good-for-all application configuration. Rather, it is recommended to consider memory configurations separately for every different application.
see also https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/lwip.html#lwip-performance
possible solutions:
I've spend some time researching. So far i don't have a complete solution but some insights:
It helps to change some options to get some more RAM. This helped to send significant longer messages before the connection broke
i also tried debug output (CONFIG_LWIP_DEBUG=y
) and i found that in mbedtls there is an unexpected error:
'DBG:/IDF/components/mbedtls/mbedtls/library/ssl_msg.c:2145: ssl->f_send() returned -26752 (-0x6880)'
0x6880 SSL - Connection requires a write call
(https://github.com/godotengine/godot/issues/59216)
Could the have something to do with missing backpressure as mentioned in python-websockets
This would make sense since everytime the error occurs, the client send message N and the server just received message N-2 or less. So that would mean that the messages are not send and the send-buffer is filling up. But i'm not quite sure about that theory, what do you think?
i found compilation times very slow. Every time only one option is changed the whole idf has to be recompiled which takes like 3-4 minutes on my computer. Is there a way to accelerate this process?
I tried to remove bluetooth from the firmware to free up some more memory but i always get error messages? I could not find anything about how to correctly remove bt. I'm not quite sure but i think i remember that you said something about removing bt before.
I also tried compiling with IDF v5.2, which is supported according to the micropython documentation. That turn out to not work at all so far.
MBEDTLS_ERR_MPI_ALLOC_FAILED
when running my application. Then i followed the steps to optimize memory for Mbed TLSGuru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled.
It helps to change some options to get some more RAM. This helped to send significant longer messages before the connection broke
Yes the problem is basically this:
Solution is preallocate what you really need in MicroPython and make sure it does not allocate over a certain size leaving enough memory for idf heap/wifi buffers. Also be aware that slicing e.g. b[0:100*n]
allocates memory so in your example this is going to lead to memory fragmentation and then the above mentioned "overallocation"
i found compilation times very slow. Every time only one option is changed the whole idf has to be recompiled which takes like 3-4 minutes on my computer. Is there a way to accelerate this process?
yes but it will depend on the number of core/threads of your computer use -j <N>
option with the make
command,
$ man make
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If there is more
than one -j option, the last one is effective. If the -j option is given without an
argument, make will not limit the number of jobs that can run simultaneously.
I also tried compiling with IDF v5.2, which is supported according to the micropython documentation. That turn out to not work at all so far.
IDF v5.1.2 it's the latest I've tested which seems to work.
@Carglglz I have an application i want to send TLS / SSL packages over websockets using aiohttp on an ESP32. The problem is that the websockets fail after a short while when using packages that have a little bigger size around 2kB to 4kB.
Here is a simple test script:
remote
I'll get the following exception after a while. Just dealing with the exception is not satisfying since it takes a while and the applicaiton is blocked in that time.
(OSError: 113 = ECONNABORTED)
Noteworthy is that there are always some packages that micropyhon thinks are sent already but don't reach the other side. I also do not receive a websocket close package. This was tested on a remote Websocket server that i don't control.
local
When i try it on a local server, i see different problems.
(OSError: 104 = ECONNRESET)
The servers seems to not receive the complete message / only a corrupted message and closes the connection:
The closing of the websocket is also not handled properly by the micropython application / aiohttp. The close package is received but the connection is not closed automatically.
Another problem is that even if i deal with the exceptions and reconnect automatically, my application will eventually crash because of a run out of memory. This is really problematic to me since i only send messages of size 2k or so. I guess this is because of the not so memory economical design of aiohttp. For sending and receiving there is always a lot of new memory allocation involved. The use of preallocation and
memoryview
seems reasonable here.This is the local python websocketserver code:
MicroPython v1.23.0-preview.344.gb1ac266bb.dirty on 2024-04-29; Generic ESP32 module with ESP32