Closed peppy closed 6 months ago
As a status update here, the actual partition structure (in terms of included columns) has been finalised in #17. What remains is rotation considerations, aka how to split up the partition ranges using updated_at
, and where the rotation process is run for cycling.
Found out we can't use TIMESTAMP
in a RANGE COLUMNS
partition schema.
CREATE TABLE `scores` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`user_id` int unsigned NOT NULL,
`beatmap_id` mediumint unsigned NOT NULL,
`ruleset_id` smallint unsigned NOT NULL,
`data` json NOT NULL,
`has_replay` tinyint(1) DEFAULT '0',
`preserve` tinyint(1) NOT NULL DEFAULT '0',
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`unix_updated_at` int NOT NULL DEFAULT (unix_timestamp()),
PRIMARY KEY (`id`,`preserve`,`updated_at`),
KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`unix_updated_at`)
(PARTITION p20231215 VALUES LESS THAN (0,1702512000) ENGINE = InnoDB,
PARTITION p20231216 VALUES LESS THAN (0,1702598400) ENGINE = InnoDB,
PARTITION p20231217 VALUES LESS THAN (0,1702684800) ENGINE = InnoDB,
PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */
This might be the new best-we-can-do.
Above has been deployed and is rotating successfully.
I setup partitioning to work with my last major effort on this table, but two pieces are missing.
Partition scheme includes
ruleset_id
but it is not usedOriginally we were looking to partition per ruleset, which seems like a great idea from a performance angle. As such, this was included in the partition scheme, but it hasn't actually been used in the constructed partitions.
I want to test the overhead having
ruleset_id
in the primary key adds, and if it is negligible then we should leave it there and ensure the constructed partitions are per-ruleset.(will be tested in #17)
Conclusion: Including ruleset in the partitioning schema is not a good direction, in terms of future extensibility and complexity in partition rotation.
Partition rotation is not yet implemented
Currently we have only two partitions –
preserve=0
andpreserve=1
. The plan is to add partition rotation for thep0
case, but this hasn't been done yet. It still seems like a good idea, but comes with a concern:We are going in a direction where a score can be switched between
preserve=0
andpreserve=1
. We also have tooling for removingpreserve=0
scores after a certain delay and when all criteria are correct (see https://github.com/ppy/osu-queue-score-statistics/issues/141).So, what happens when a score becoming
p=0
and falls into a partition that is about to be rotated for cleanup? If the cleanup process is guaranteeing a time window before scores are cleaned up after the lastpreserve
flag switch, then they may end up getting rotated out of existence too soon. An example would be a user unpinning a score, then realising they want to pin it again a few minutes later. With unfortunate timing it would be lost during this period.A solution may be to change the partitioning to be on
updated_at
instead ofcreated_at
and ensure we are updatingupdated_at
on any row change.Partition rotation should be considered for other tables
solo_score_tokens