Open alxhotel opened 4 years ago
I think that supporting a redis backend is a great idea!
Also – can you explain a bit more about how the multiprocess websocket processes currently work? Do you shard based on the torrent info hash? Is this code open source somewhere?
@breebee Thanks for sharing the link to aquatic
. However, we can't directly run a tracker inside a browser because browsers can't start TCP listening servers.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
Redis for swarm peers storage and pubsub for sending offers/answer between socket servers is great
Bugout could be used to offload some signaling to users.
Currently
The current implementation of the websocket tracker is based on:
This is ok-ish for an HTTP and UDP tracker, since none of those need to maintain an open connection to the peer.
However, for a Websocket tracker, this is hard to scale to a high number of peers; since every new peer means a new connection (CPU + RAM consumption) and messages between peers should be relayed through the server.
Problem
Right now, the
bittorrent-tracker
package is using around 12.8 KB per connection.By changing the
ws
package to theuws
package, this will already reduce the memory consumption per connection to 3.2 KB (75% less). Despite this improvement in memory, there's still a bottleneck in CPU.More peers => more messages => more CPU consumption.
Possible solutions?
Right now the OpenWebtorrent tracker solves this by using multiple processes, as seen in the image below:
This design improves the server's ability to handle more connections, since now the websocket logic is being distributed through multiple processes. By using this new design the server is able to handle from 50k peers (using only one process) to 100k.
However, as you might have imagined, the one process that handles all the "queries" to the database is the bottleneck. When a high number of peers connect to the server, that process can not handle the amount of messages; thus, the process runs out of memory.
So the solution seems simple. We just need to replace that process with a fault tolerant system that allows to distribute the load to the "database" through different processes. The first thing that came to my mind is a pub/sub system, such as:
These types of system might even allow to store temporally some part of the memory to disk; thus, using less memory but with a higher delay. This way we could make a cluster of "database processes" that could handle a higher number of peers.
Please feel free to share your feedback, ideas and PRs :)