Update documents to be clearer for non-native speakers.

GingerMoon commented 3 years ago

I tried hard and spent a lot of time on understanding the archetecture doc. but still failed to understand it. I would appreciate if anyone could help me and answer my questions about the document ( I am sorry if answers to these questions are quite obvious):

In Glossary

Notification A message sent to an endpoint node intended for delivery to a HTTP endpoint. Autopush stores these in the message tables.

What does this sentense mean exactly?

"Autopush stores these in the message tables." According to doc,

Endpoint nodes handle all Notification POST requests, looking up in DynamoDB to see what Push server the UAID is connected to. The Endpoint nodes then attempt delivery to the appropriate connection node. If the UAID is not online, the message may be stored in DynamoDB in the appropriate message table.

So it's "autoendpoint (endpoint node)" , NOT "autopush (connection node)" which stores these notification in message tables.

2. Router Table Schema

node_id Hostname of the connection node the client is connected to.

Is the client here the same meaning as "user agent"?

Autopush uses an optimistic deletion policy for node_id to avoid delete calls when not needed. During a delivery attempt, the endpoint will check the node_id for the corresponding UAID. If the client is not connected, it will clear the node_id record for that UAID in the router table.

Considering the concept below: autopush (connection node) autoendpoint (endpoint node)

So "Autopush" is confusing here. It's autoendpoint instead of autopush that clears the node_id record, right?

Is the primary key of the router table <uaid(partition key),node_id(sort key)> ? (Sorry to ask, where can I find the schema in the source?)

The last_connect has a secondary global index on it to allow for maintenance scripts to locate and purge stale client records and messages.

What does "secondary global index" here mean? How does "secondary global index" allow for maintenance scripts to locate and purge stale client records and messages?

If table rotation is disabled, the last message table used will become ‘frozen’ and will be used for all future messages.

Is it saying "If table rotation is NOT disabled"? How is "Message Table Rotation" implemented?

6.

In Rules for Endpoints

Read the chan list entry from the appropriate month message table to see if its a valid channel.

What does it mean? Where is the chan list coming from?

Don't quite understand "Rules for Connection Nodes"....

jrconlin commented 3 years ago

First: THANK YOU! It is too easy for us to use language shortcuts which can be confusing to anyone who does not understand them.

I will try to fix these in the document, when I can, but I hope the following helps.

Many dictionaries will provide a word, and then follow it with it's definition. For these, it's presumed that the word "is" can be inserted. So

Notification A message sent to an endpoint node

would be the same as saying

A Notification is a message sent to an endpoint node

This is "short hand" (or a quick form of style) that removes some repetition.

This definition states that "Notifications are the messages sent to an endpoint node (also known as a "Subscription update" in the RFC), which are stored in the messages database table.

It may help to understand that Autopush is actually two different programs. There is the "connection handler" (which, unfortunately, is called autopush and does not help remove confusion) that maintains the long-lived websocket connection between the browser (also called the user-agent) and the Autopush service, and the "endpoint handler" (called autoendpoint) which accepts messages from third party webpush providers and routes them to the appropriate service.

If it's any help what-so-ever, please consider how we speak of this service internally:

We refer to the entire service (databases, programs, etc.) as the Web Push service.
The program that handles the User Agent connections is autopush
The program that handles the Subscription Updates sent by third-party, outside (non-mozilla, or customer) is autoendpoint.

Unfortunately, due to legacy reasons, we can't change the name of the project.

(Skipping a few of the other questions because they may be related)

(& 4)

Is the primary key of the router table <uaid(partition key),node_id(sort key)> ? (Sorry to ask, where can I find the schema in the source?)

DynamoDB is a Key/Value database (sometimes called NoSQL) and does not have a traditional SQL Schema. Instead, you define a Primary Key and a Secondary Hash and use those to refer to arbitrary fields. Think of it like having data in a barrel you store in a warehouse. You know the aisle, and shelf a barrel is at, but anything else requires you to keep another index. Fortunately, you can store whatever you like in that barrel.

This means having a schema doesn't really make sense because you would have many, many optional fields.

As much as I don't like to say it, your best chance to understand what's going on is to look at the code

Table rotation was originally done because AWS did not offer a TTL (Time To Live) option, which would allow data to be automatically deleted once it had expired. Table rotation was complicated, annoying, and broke often. Table rotation is not used anymore, and we are very happy about that.
"chan list" was short hand for the "Channel List". This is the list of channels associated with the User Agent. The Channel List was one of those many records stored in the data barrel (see #3.(&4) above). I'll also say that we were being "clever" by storing this in a DynamoDB record that had a Primary Key (the user's Agent ID or UAID), but no Secondary Hash (well, kind of. We used a single space character). This was where we store information about the user in general. I'll apologize, because "clever" often leads to "confusing", and that's undoubtedly the case here.
Connection nodes. Those are the autopush connection handlers. Each connection handler can only accept some number of connections. We have to run many of them to handle all the users of our system. When a user agent connects to a connection node, the connection node "registers" that user into the routing table. When an incoming message for a given UAID is accepted by the endpoint handler, the endpoint handler looks up how to route the message to the correct connection node by looking into the routing table.

Again, thank you for these questions. I will try to make the documents a bit clearer. The new Autopush-rs server works in the same way as this server and is what we currently use in production for connection handling. We're hoping to get the second part (the endpoint handler) running soon, but we need to test it out.

GingerMoon commented 3 years ago

Thanks @jrconlin!

jrconlin commented 3 years ago

Going to reopen this, because I still need to make the documentation better. 😉

mozilla-services / autopush

Update documents to be clearer for non-native speakers. #1439