tardis-dev / tardis-machine

Locally runnable server with built-in data caching, providing both tick-level historical and consolidated real-time cryptocurrency market data via HTTP and WebSocket APIs
https://docs.tardis.dev/api/tardis-machine
Mozilla Public License 2.0
235 stars 46 forks source link

Error when high volumes - Error: Invalid access of closed uWS.WebSocket/SSLWebSocket #26

Open SadaharuEdo opened 7 months ago

SadaharuEdo commented 7 months ago

Hello, I am running the Tardis-machine and utilizing its WebSocket abilities with Docker on an EC2 instance (2 vCPUs, 8GB RAM), alongside other services. First of all, the service works very well 99% of the time, and I'd like to express my gratitude for the great work, guys! 👍

However, upon examining the collected data, I've discovered that during periods of high volume, I encounter errors in my data. After further investigation, I found in the container logs that these high-volume periods lead to errors in Tardis-machine. (I provide an example below)

2023-11-15T18:13:33.444231457Z WebSocket /ws-stream-normalized error: Error: Invalid access of closed uWS.WebSocket/SSLWebSocket.
2023-11-15T18:13:33.446026759Z     at streamNormalizedWS (/usr/local/lib/node_modules/tardis-machine/dist/ws/streamnormalized.js:70:41)
2023-11-15T18:13:33.446039048Z Unhandled Rejection at Promise AbortError: The operation was aborted
2023-11-15T18:13:33.446044057Z     at Object.destroyer (node:internal/streams/destroy:305:11)
2023-11-15T18:13:33.446048731Z     at createAsyncIterator (node:internal/streams/readable:1141:19) {
2023-11-15T18:13:33.446053299Z   code: 'ABORT_ERR'
2023-11-15T18:13:33.446057688Z } Promise {
2023-11-15T18:13:33.446062221Z   <rejected> AbortError: The operation was aborted
2023-11-15T18:13:33.446070455Z       at Object.destroyer (node:internal/streams/destroy:305:11)
2023-11-15T18:13:33.446100580Z       at createAsyncIterator (node:internal/streams/readable:1141:19) {
2023-11-15T18:13:33.446106619Z     code: 'ABORT_ERR'
2023-11-15T18:13:33.446110859Z   }

My questions are:

Do you think the problem primarily stems from the exchanges themselves, possibly being overloaded by requests during high-traffic periods? Alternatively, do you believe I should consider running multiple instances of my container (I've heard about a cluster mode)? Should I consider adding more CPUs or RAM to my machine? Thanks a lot!

zr-mah commented 6 months ago

Hi,

We are facing a similar problem too image

SadaharuEdo commented 6 months ago

Hi @zr-mah, in my case, I have upgraded the instance where I am using the Tardis machine to 16GB RAM and 8 vCPUs. I have faced no issues since I migrated my setup to this machine. The data I manage to ingest is close to 99.99% from the exchange's historical data.

So, in my opinion, if you encounter this error, it could be related to the usage of your server: are there other processes running on it? Is it already heavily utilized?

Hope this will help you,

zr-mah commented 6 months ago

Hi @SadaharuEdo,

Thank you for your pointer! What you were saying might be true. Because this issue only occurs if we call the server concurrently within milliseconds And the server spec we were using is AWS t4g.small with 2vCPU and 2GB RAM.

We didn't think it was this issue because the usage is low.

We will definitely have a try with a higher spec machine.

Thanks!

SadaharuEdo commented 6 months ago

Furthemore I have enabled the cluster mode for thardis machine (cf. here)