odota / core

Open source Dota 2 data platform
https://www.opendota.com
MIT License
1.51k stars 303 forks source link

implement feedv2 API #1620

Closed howardchung closed 5 years ago

howardchung commented 6 years ago

@bippum is interested in working on this

7596ff commented 6 years ago

Some design thoughts I had so far:

howardchung commented 6 years ago
7596ff commented 6 years ago

That's true, due to the nature of streams they will receive any new matches that fit their criteria. And as you already mentioned we already need one Redis client per connection so that's not an issue.

howardchung commented 6 years ago

I think we can have only one stream, and evaluate whether each blob fits the criteria before we send it to the client.

Otherwise we may have a ton of streams, and cleanup will be more challenging.

howardchung commented 6 years ago

Potential concern: If we keep a lot of history, evaluating the backlog on a request start may be CPU-bound (and since Node is single threaded, it'll lock up the web instance)

7596ff commented 6 years ago

What if we just use Postgres instead? This is such a tiny feature, dynamically creating and destroying tables as well as lots of writes and occasional reads are very cheap and performance isn't a big loss compared to Redis.

7596ff commented 6 years ago

Here's a more fleshed out design document.

Design

As matches come in from scanner, they would be imported into the feed process much the same way as they are currently, using RPUSH/LPOP or a pub/sub system.

Clients will register with the Service using their API key and Client ID. The Client ID can be generated as a UUID or possibly a Snowflake. Clients will start a long-lived HTTP request similar to Twitter Streams.

Each Client's criteria is stored in Redis in a key with its Client ID, with the value being a stringifed JSON object. Each time the Client requests a criteria change the key is updated.

Upon a Client's initial connection, a table in the Postgres database is created. The schema will be as follows:

CREATE TABLE feed_CLIENT_ID (
    seq INT NOT NULL,
    matchid INT NOT NULL,
    timestamp INT NOT NULL, -- timestamp in JS integer date format
    players INT[],
    teams INT[],
    leagues INT[],
    PRIMARY KEY (seq, matchid)
);

players, teams, and leagues are arrays of matched IDs from which the match was inserted into the table.

On a successful match of a Client's criteria, a sequence number is generated for the match ID and it is inserted into the Client's table and posted to the Client's streaming connection at the same time. The data that is inserted into the table is the exact same data that is sent to the client.

If a Client disconnects at any time, due to internet issues, etc, they may reconnect by connecting with a sequence number in their initial packet, and the Service will send them every single match in their table up until the current time, then continue streaming matches normally.

Issues

The Service might need to prevent duplicate connections of the same Client ID, or simply not care and stream to two Clients at once.

As for cleaning tables, a job could be scheduled hourly to clean data older than 7 days, 14 days, or as little as 24 hours, up to administrator discretion.

howardchung commented 6 years ago

Questions:

7596ff commented 6 years ago
howardchung commented 5 years ago

Redis 5 is now out! Let's build this