Closed howardchung closed 5 years ago
Some design thoughts I had so far:
That's true, due to the nature of streams they will receive any new matches that fit their criteria. And as you already mentioned we already need one Redis client per connection so that's not an issue.
I think we can have only one stream, and evaluate whether each blob fits the criteria before we send it to the client.
Otherwise we may have a ton of streams, and cleanup will be more challenging.
Potential concern: If we keep a lot of history, evaluating the backlog on a request start may be CPU-bound (and since Node is single threaded, it'll lock up the web instance)
What if we just use Postgres instead? This is such a tiny feature, dynamically creating and destroying tables as well as lots of writes and occasional reads are very cheap and performance isn't a big loss compared to Redis.
Here's a more fleshed out design document.
As matches come in from scanner, they would be imported into the feed process much the same way as they are currently, using RPUSH/LPOP or a pub/sub system.
Clients will register with the Service using their API key and Client ID. The Client ID can be generated as a UUID or possibly a Snowflake. Clients will start a long-lived HTTP request similar to Twitter Streams.
Each Client's criteria is stored in Redis in a key with its Client ID, with the value being a stringifed JSON object. Each time the Client requests a criteria change the key is updated.
Upon a Client's initial connection, a table in the Postgres database is created. The schema will be as follows:
CREATE TABLE feed_CLIENT_ID (
seq INT NOT NULL,
matchid INT NOT NULL,
timestamp INT NOT NULL, -- timestamp in JS integer date format
players INT[],
teams INT[],
leagues INT[],
PRIMARY KEY (seq, matchid)
);
players
, teams
, and leagues
are arrays of matched IDs from which the match was inserted into the table.
On a successful match of a Client's criteria, a sequence number is generated for the match ID and it is inserted into the Client's table and posted to the Client's streaming connection at the same time. The data that is inserted into the table is the exact same data that is sent to the client.
If a Client disconnects at any time, due to internet issues, etc, they may reconnect by connecting with a sequence number in their initial packet, and the Service will send them every single match in their table up until the current time, then continue streaming matches normally.
The Service might need to prevent duplicate connections of the same Client ID, or simply not care and stream to two Clients at once.
As for cleaning tables, a job could be scheduled hourly to clean data older than 7 days, 14 days, or as little as 24 hours, up to administrator discretion.
Questions:
Redis 5 is now out! Let's build this
@bippum is interested in working on this
https://api.opendota.com/feed?account_id=1