implement feedv2 API - Githubissues

howardchung commented 6 years ago

@bippum is interested in working on this

Requires Redis 5 (streams)
Uses the Redis stream sequence number to allow users to pick up where they disconnected (no more missing matches)
Evaluate each blob we plan to send to the client -- does it contain a given account_id? is it a particular game mode? is it part of a particular league?
Each time we get a match from steam API or parse a match, push the blob into a redis stream
on each insert, set a max size for the stream (10,000 blobs?), so old data is automatically cleaned up
Use streaming HTTP instead of websocket for improved reliability and integration with the documentation
Example: https://api.opendota.com/feed?account_id=1

7596ff commented 6 years ago

Some design thoughts I had so far:

Require users to register an application to connect to the service, no longer just anonymous
Initial connection is one long-lived HTTP request, similar to a Twitter stream
The list of subscribed IDs is amended through separate HTTP requests with session and client ID from first request

howardchung commented 6 years ago

registration i think we may add to the API in general, so I don't think it needs to be implemented specifically for this endpoint.
What if the user just disconnects and reconnects with the updated list of subscriptions?

7596ff commented 6 years ago

That's true, due to the nature of streams they will receive any new matches that fit their criteria. And as you already mentioned we already need one Redis client per connection so that's not an issue.

howardchung commented 6 years ago

I think we can have only one stream, and evaluate whether each blob fits the criteria before we send it to the client.

Otherwise we may have a ton of streams, and cleanup will be more challenging.

howardchung commented 6 years ago

Potential concern: If we keep a lot of history, evaluating the backlog on a request start may be CPU-bound (and since Node is single threaded, it'll lock up the web instance)

7596ff commented 6 years ago

What if we just use Postgres instead? This is such a tiny feature, dynamically creating and destroying tables as well as lots of writes and occasional reads are very cheap and performance isn't a big loss compared to Redis.

7596ff commented 6 years ago

Here's a more fleshed out design document.

Design

As matches come in from scanner, they would be imported into the feed process much the same way as they are currently, using RPUSH/LPOP or a pub/sub system.

Clients will register with the Service using their API key and Client ID. The Client ID can be generated as a UUID or possibly a Snowflake. Clients will start a long-lived HTTP request similar to Twitter Streams.

Each Client's criteria is stored in Redis in a key with its Client ID, with the value being a stringifed JSON object. Each time the Client requests a criteria change the key is updated.

Upon a Client's initial connection, a table in the Postgres database is created. The schema will be as follows:

CREATE TABLE feed_CLIENT_ID (
    seq INT NOT NULL,
    matchid INT NOT NULL,
    timestamp INT NOT NULL, -- timestamp in JS integer date format
    players INT[],
    teams INT[],
    leagues INT[],
    PRIMARY KEY (seq, matchid)
);

players, teams, and leagues are arrays of matched IDs from which the match was inserted into the table.

On a successful match of a Client's criteria, a sequence number is generated for the match ID and it is inserted into the Client's table and posted to the Client's streaming connection at the same time. The data that is inserted into the table is the exact same data that is sent to the client.

If a Client disconnects at any time, due to internet issues, etc, they may reconnect by connecting with a sequence number in their initial packet, and the Service will send them every single match in their table up until the current time, then continue streaming matches normally.

Issues

The Service might need to prevent duplicate connections of the same Client ID, or simply not care and stream to two Clients at once.

As for cleaning tables, a job could be scheduled hourly to clean data older than 7 days, 14 days, or as little as 24 hours, up to administrator discretion.

howardchung commented 6 years ago

Questions:

There doesn't seem to be a field to store the actual match data in the postgres table? How will that data be fetched if the user requests an older sequence number?
I am not sure if dynamically creating postgres tables is good practice. perhaps we could make it one static table with an indexed clientid field?
Why does the postgres table need to store the arrays of players/teams/leagues? Isn't the subscription data kept in redis and postgres intended to just be used as a store to hold the backlog of the match streams?

7596ff commented 6 years ago

I suppose making a separate match call for each match in the backlog could be pretty expensive. We can just store the match data in a JSON blob in the table or something.
Yes, that sounds better, I can look in to that.
The arrays of players/teams/leagues are the matched IDs that the match was found on, though I suppose those can just be calculated on client side by testing them against your own rules.

howardchung commented 5 years ago

Redis 5 is now out! Let's build this

odota / core

implement feedv2 API #1620

Design

Issues