louking / tm-csv-connector

Time Machine - CSV Connector - creates CSV file from time machine input, for scoring software to read
Apache License 2.0
1 stars 0 forks source link

results view: sometimes results are lost from the reader to backend #24

Closed louking closed 10 months ago

louking commented 10 months ago

possibly this is after "long" delay, but haven't found way to reproduce this reliably

consider using simple_websocket in tsm-csv-reader to add automated ping/pong

@miguelgrinberg can simple_websocket be used with async?

miguelgrinberg commented 10 months ago

Depends on what you mean by "async". If you mean asyncio, then not right now, but I plan to add it. If you mean gevent or eventlet, then yes.

louking commented 10 months ago

Yes, asyncio. I was considering replacing websockets.connect in

https://github.com/louking/tm-csv-connector/blob/35c9d90a92661a63e86bb437111287da13010d83/tm-reader-client/app.py#L158-L188

I guess I'll do the ping/pong at the application level.

While I've been developing software professionally since 1979 and using python since maybe 2009 or so, this is my first project with websockets or asyncio, so it's a learning experience.

miguelgrinberg commented 10 months ago

In terms of the protocol ping/pong, typically it is the server that sends the ping, and clients respond with the pong. Sounds like you are looking to do the reverse, which may work to keep the connection alive, but it is unusual, so doing it in the app may make more sense.

louking commented 10 months ago

FWIW, websockets client is already sending pings, so I shouldn't need to do this at the application level. I'll monitor the logging and see if there's anything useful logged by the client around the time a message is lost. I'll also figure out how to log on the server side.

2023-09-05 12:48:48,107 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 ?? = connection is CONNECTING
2023-09-05 12:48:48,109 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > GET /tm_reader HTTP/1.1
2023-09-05 12:48:48,109 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > Host: tm.localhost:8080
2023-09-05 12:48:48,110 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > Upgrade: websocket
2023-09-05 12:48:48,111 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > Connection: Upgrade
2023-09-05 12:48:48,111 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > Sec-WebSocket-Key: U0MLJcM5ucZYb/AegfuQtA==
2023-09-05 12:48:48,112 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > Sec-WebSocket-Version: 13
2023-09-05 12:48:48,112 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
2023-09-05 12:48:48,113 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > User-Agent: Python/3.10 websockets/11.0.3
2023-09-05 12:48:48,128 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < HTTP/1.1 101
2023-09-05 12:48:48,128 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < Server: nginx/1.25.2
2023-09-05 12:48:48,128 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < Date: Tue, 05 Sep 2023 16:48:48 GMT
2023-09-05 12:48:48,129 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < Connection: upgrade
2023-09-05 12:48:48,131 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < Upgrade: WebSocket
2023-09-05 12:48:48,131 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < Sec-WebSocket-Accept: CJ0FYQdU2kyw+lhx0AzeC81OnDM=
2023-09-05 12:48:48,131 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits=15
2023-09-05 12:48:48,131 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None = connection is OPEN
2023-09-05 12:48:48,132 tm-csv-connector INFO: websocket to backend connected
2023-09-05 12:49:08,119 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None % sending keepalive ping
2023-09-05 12:49:08,119 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > PING a0 dd 3d 7c [binary, 4 bytes]
2023-09-05 12:49:08,125 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < PONG a0 dd 3d 7c [binary, 4 bytes]
2023-09-05 12:49:08,126 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None % received keepalive pong
2023-09-05 12:49:28,114 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None % sending keepalive ping
2023-09-05 12:49:28,114 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None > PING 83 0d b4 04 [binary, 4 bytes]
2023-09-05 12:49:28,118 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None < PONG 83 0d b4 04 [binary, 4 bytes]
2023-09-05 12:49:28,119 websockets.client DEBUG: c33a2526-5d07-4778-b5fc-2283d9105809 None % received keepalive pong
louking commented 10 months ago

@miguelgrinberg - I can't see if there's any logging by flask-sock or what it uses (I think simple_websocket > wsproto > h11). Am I missing something?

miguelgrinberg commented 10 months ago

I have not implemented logging myself, and I don't recall seeing any logging options in wsproto either.

louking commented 10 months ago

Instance of lost message:

2023-09-06 16:45:23,046 tm-csv-connector DEBUG: websocket sending to backend: {"opcode": "select", "raceid": "3", "pos": 12, "time": "00:38:29.26", "bibno": 12}
2023-09-06 16:45:23,047 tm-csv-connector INFO: websocket to backend closed, reconnecting
2023-09-06 16:45:23,056 tm-csv-connector INFO: websocket to backend connected

access log during that period (these are browser requests for updated data to the browser view, but no websocket connect requests were seen):

2023-09-06 16:45:22 172.19.0.1 - - [06/Sep/2023:20:45:22 +0000] "GET /results/rest?since=2023-09-06T16:45:21-04:00 HTTP/1.1" 200 1408 "http://tm.localhost:8080/results" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
2023-09-06 16:45:23 172.19.0.1 - - [06/Sep/2023:20:45:23 +0000] "GET /results/rest?since=2023-09-06T16:45:22-04:00 HTTP/1.1" 200 1408 "http://tm.localhost:8080/results" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
2023-09-06 16:45:24 172.19.0.1 - - [06/Sep/2023:20:45:24 +0000] "GET /results/rest?since=2023-09-06T16:45:23-04:00 HTTP/1.1" 200 1408 "http://tm.localhost:8080/results" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"

updated with more details:

2023-09-07 12:47:06,140 tm-csv-connector DEBUG: websocket sending to backend: {"opcode": "select", "raceid": "3", "pos": 43, "time": "06:17:19.51", "bibno": 19}
2023-09-07 12:47:06,141 tm-csv-connector INFO: websocket to backend closed, reconnecting (<class 'websockets.exceptions.ConnectionClosedError'>, no close frame received or sent)
2023-09-07 12:47:06,157 tm-csv-connector INFO: websocket to backend connected

note this was after chrome started delaying query interval, not sure if that's relevant

2023-09-07 12:46:14 172.19.0.1 - - [07/Sep/2023:16:46:14 +0000] "GET /results/rest?since=2023-09-07T12:45:14-04:00 HTTP/1.1" 200 2324 "http://tm.localhost:8080/results" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
2023-09-07 12:47:14 172.19.0.1 - - [07/Sep/2023:16:47:14 +0000] "GET /results/rest?since=2023-09-07T12:46:14-04:00 HTTP/1.1" 200 2324 "http://tm.localhost:8080/results" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
2023-09-07 12:48:14 172.19.0.1 - - [07/Sep/2023:16:48:14 +0000] "GET /results/rest?since=2023-09-07T12:47:14-04:00 HTTP/1.1" 200 2324 "http://tm.localhost:8080/results" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
miguelgrinberg commented 10 months ago

I'm sorry, but I don't know what you are showing me here and how or if it is related to any of my open source packages.

louking commented 10 months ago

@miguelgrinberg thanks for your diligence. I'm trying to track down why I'm losing messages from the websockets client to the flask-sock server, so I'm keeping notes about it. If I have a specific question for you I'll tag you.

Unfortunately I didn't collect enough information with this last log, but if you're interested it looks like the connection was closed when I tried to send a message. websockets (package) has more information about the reason for the closure, but again I didn't log it and websockets didn't log it either (using INFO). As we discussed there's no logging on the server side (with flask-sock).

I need this interface to be reliable and I'm doing something wrong apparently in my use of websockets (technology) that's causing me to drop messages. Frankly I may just give up on using websockets for this interface and do http POST instead.

miguelgrinberg commented 10 months ago

Sorry, I thought you were asking me to look at those logs.

louking commented 10 months ago

@shanholtzerk see https://github.com/louking/tm-csv-connector/issues/24#issuecomment-1709094533. Occasionally (not sure the stimulus), the websocket drops at the time a message is sent, and that message is lost. I could implement that the message isn't removed from the queue until the async send returns successfully, but then (I think) I have to deduplicate at the receiver.

Another alternative is to abandon websockets tech for this communication path and just use http POST. I think that HTTP/TCP will make sure it gets through.

Do you have an opinion?

shanholtzerk commented 10 months ago

You may have better luck with REST. Have you tried using wireshark to look at the packets?

louking commented 10 months ago

Hmm, I hadn't thought of using wireshark. Maybe I'll try to reproduce this one more time with wireshark running. I'm not sure, though, what I would learn that would change how I would fix the problem.

One oddity: the problem seen in https://github.com/louking/tm-csv-connector/issues/24#issuecomment-1709094533 seems to occur more frequently when the browser client websocket isn't as active or attached. The browser websocket connects to a server in tm-reader-client process, and checks if the process is connected to the time machine, in order to display the Connect/Disconnect button correctly.

By "isn't as active or attached" I mean: if I click on another tab in the browser window, eventually chrome slows down the setInterval used to check the button

https://github.com/louking/tm-csv-connector/blob/c28bd436f111462aef570014a15cab6a179e6911/app/src/tm_csv_connector/static/results.js#L57

and if I navigate to another view the client connection is closed.

louking commented 10 months ago

I'm confused by what I'm seeing with wireshark. I'm seeing WebSocket Text messages from the browser but not between the backend and tm process. Most likely I'm monitoring the wrong interface.

No matter as I will go with a REST interface.