letsencrypt / boulder

An ACME-based certificate authority, written in Go.
Mozilla Public License 2.0
5.2k stars 606 forks source link

Allow work-splitting for ocsp-updater #5625

Closed jcjones closed 3 years ago

jcjones commented 3 years ago

Of similar importance than #5620 is to allow ocsp-updater to split out workload. The findStaleOCSPResponses method right now finds all responses, while, since we have an index on serial, we could bucket on the last hex character of the serial varchar, such as:

statuses, err := sa.SelectCertificateStatuses(
        updater.readOnlyDbMap,
        `WHERE ocspLastUpdated < :lastUpdate
         AND NOT isExpired
                 AND RIGHT(serial,1) IN ( :bucketList )
         ORDER BY ocspLastUpdated ASC
         LIMIT :limit`,
        map[string]interface{}{
            "lastUpdate": oldestLastUpdatedTime,
            "limit":      batchSize,
                        "bucketList": strings.Join(bucketChars, ",")
        },
    )

The OCSP Updater as-is can only one as one instance, because were additional instances running, they would (generally) acquire the same CertificateStatus objects from the database and race to update them, each clobbering each other's changes, adding contention to the database to no net gain.

This change allows the Updater to split out its work, requesting only the CertificateStatus objects whose serials end with one of the hexadecimal characters configured. This will allow us to run multiple copies of the OCSP Updater.

Then we can run one for all chars 0-8 and one for 9-f (or whatever), potentially reading from separate read replicas.

jcjones commented 3 years ago

Current query:

analyze format=json select * from certificateStatus where ocspLastUpdated < now() and not isExpired order by ocspLastUpdated ASC limit 10; 
 {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.111887981,
    "table": {
      "table_name": "certificateStatus",
      "partitions": ["p_20210426", "p_20210526", "p_20210625", "p_20210725"],
      "access_type": "ref",
      "possible_keys": ["isExpired_ocspLastUpdated_idx"],
      "key": "isExpired_ocspLastUpdated_idx",
      "key_length": "2",
      "used_key_parts": ["isExpired"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 16421932,
      "r_rows": 10,
      "r_table_time_ms": 0.078410448,
      "r_other_time_ms": 0.024007277,
      "filtered": 29.71552086,
      "r_filtered": 100,
      "attached_condition": "certificateStatus.isExpired <=> 0 and certificateStatus.ocspLastUpdated < '2021-09-02 01:59:37'"
    }
  }
} 

vs split query:

analyze format=json select * from certificateStatus where ocspLastUpdated < now() and not isExpired and right(serial,1) in ('0','1','2','3') order by ocspLastUpdated ASC limit 10;
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.282663774,
    "table": {
      "table_name": "certificateStatus",
      "partitions": ["p_20210426", "p_20210526", "p_20210625", "p_20210725"],
      "access_type": "ref",
      "possible_keys": ["isExpired_ocspLastUpdated_idx"],
      "key": "isExpired_ocspLastUpdated_idx",
      "key_length": "2",
      "used_key_parts": ["isExpired"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 16421927,
      "r_rows": 31,
      "r_table_time_ms": 0.233537952,
      "r_other_time_ms": 0.038879623,
      "filtered": 29.71551704,
      "r_filtered": 32.25806452,
      "attached_condition": "certificateStatus.isExpired <=> 0 and right(certificateStatus.`serial`,1) in ('0','1','2','3') and certificateStatus.ocspLastUpdated < '2021-09-02 01:59:28'"
    }
  }
}
jcjones commented 3 years ago

I think I'll revise my earlier statement and say this is probably of higher importance than #5620.