thegreenwebfoundation / admin-portal

The member portal and platform powering The Green Web Foundation
Apache License 2.0
24 stars 11 forks source link

Add a high level outline of how our checks work and how data flows through the system #491

Closed mrchrisadams closed 8 months ago

mrchrisadams commented 1 year ago

Note: I'm updating our key flows here in this issue, with a view to putting them into our docs so we can refer to them a planned database migration. Using github issues makes it easy to make sequence diagrams

Our usual, fast check

Most of our API traffic works like this. When our own site does a greencheck, or a third party uses our greencheck API the results look like the diagram below.

We prioritise returning a result quickly from a cache, then we queue an update to the cache, and a write to a logginf table by passing the domain to RabbitMQ.

sequenceDiagram

    Browser client->>+Nginx: Look up domain
    Nginx->>+Django Web: Forward to a django web process
    Django Web->>+Database: Look up domain
    Database->>-Django Web: Return lookup result
    Django Web->>+Nginx: Return rendered result

    Nginx->>+Browser client: Present domain result
    Django Web->>+RabbitMQ: Queue domain for logging    

Updating our green domains table

Once we have the domain queued, another worker takes the domain, and does two things.

  1. It updates our green domain table acting somewhat like a cache, so we return results quickly.
  2. It then logs the check for later aggragate analysis.

Because the greenchecks can be comparatively slow, this allows us to return results over the API quickly, at the expense of a check result being a little out of date. It also lets us control the load we place on the database by controlling when we are logging all the checks to the database.

sequenceDiagram

    Django Dramatiq->>+RabbitMQ: Check for any domains to log
    RabbitMQ->>+Django Dramatiq: Return domain to log
    Django Dramatiq->>+Database: Log domain to greencheck table
    Django Dramatiq->>+Database: Update greendomain tables

We can scale the two of these independently, depending on the traffic we are receiving, and RabbitMQ here acts like a buffer.

We also have a slower check that does all of this in one synchronous request - here we prioritise accuracy over response time, and skip any lookup against a local cache table. We still log the check for a worker to pull off the queue, and to update the green domain cache as usual, but this allows us to always show the result from a full network look up in case we suspect that the result in our cache table is stale.

The result looks like so:

sequenceDiagram

    Browser client->>+Nginx: Send a request to check a website domain
    Nginx->>+Django Web: Forward to a django web process
    Django Web->>+External Network: Look up domain
    External Network->>+Django Web: Return domain lookup
    Django Web->>+Database: Clear old cached domain lookup from database
    Django Web->>+Nginx: Return rendered result

    Nginx->>+Browser client: Present result for website check
    Django Web->>+RabbitMQ: Queue domain for logging
mrchrisadams commented 8 months ago

this is merged into our docs and visible on RTD

https://greenweb.readthedocs.io/en/latest/understanding-the%20flow-of-a-greencheck.html#