Closed V-FEXrt closed 1 month ago
Do we need to do any work to ensure older clients that are still sending the visible files over don't do anything weird?
The API actually still requires that clients send the visible files as part of the request so that we can correctly calculate the hash. Visible files are still part of the key we just don't need to store them in the database, storing the key (hash) is sufficient
Also did you do any profiling on how much this helps evictions?
No, I did see a performance increase on insertion though. I'll run some profiling on eviction
You don't mention it but it seems like this would also speed up job insertion since there is less to keep track of.
Ah yeah, I 100% meant to mention that
Also git grep shows that there is some more code to be removed in rust/rsc/src/bin/rsc/(types|main).rs
See the first response, we need to keep those because we need to keep the visible files as part of the request, just not part of the database
Oh wow! Yeah a quite large speed up. 20.9
seconds vs 1.3
seconds for 344
jobs
test=# select count(*) from job;
count
-------
344
(1 row)
test=# EXPLAIN ANALYZE DELETE FROM job;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Delete on job (cost=0.00..44.39 rows=0 width=0) (actual time=0.758..0.759 rows=0 loops=1)
-> Seq Scan on job (cost=0.00..44.39 rows=339 width=6) (actual time=0.004..0.060 rows=344 loops=1)
Planning Time: 0.044 ms
Trigger for constraint fk-visible_file-job: time=19673.332 calls=344
Trigger for constraint fk-output_file-job: time=1216.633 calls=344
Trigger for constraint fk-output_file-job: time=5.373 calls=344
Trigger for constraint fk-output_file-job: time=70.707 calls=344
Trigger for constraint fk-job-use-job: time=2.451 calls=344
Execution Time: 20969.768 ms
(9 rows)
test=# select count(*) from job;
count
-------
0
(1 row)
vs
test=# select count(*) from job;
count
-------
344
(1 row)
test=# EXPLAIN ANALYZE DELETE FROM job;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Delete on job (cost=0.00..46.54 rows=0 width=0) (actual time=0.845..0.846 rows=0 loops=1)
-> Seq Scan on job (cost=0.00..46.54 rows=354 width=6) (actual time=0.006..0.092 rows=344 loops=1)
Planning Time: 0.061 ms
Trigger for constraint fk-output_file-job: time=1245.098 calls=344
Trigger for constraint fk-output_file-job: time=4.393 calls=344
Trigger for constraint fk-output_file-job: time=66.585 calls=344
Trigger for constraint fk-job-use-job: time=1.672 calls=344
Execution Time: 1318.919 ms
(8 rows)
test=# select count(*) from job;
count
-------
0
(1 row)
Visible files were being tracked in the database but their full definition isn't actually needed in the cache. Instead it is sufficient to use them to calculate the job hash then to discard them.
As most jobs have many visible file this will significantly increase the speed of job eviction and significantly decrease the size of the database