sifive / wake

The SiFive wake build tool
Other
86 stars 28 forks source link

rsc: Don't track unnecessary visible files #1612

Closed V-FEXrt closed 1 month ago

V-FEXrt commented 1 month ago

Visible files were being tracked in the database but their full definition isn't actually needed in the cache. Instead it is sufficient to use them to calculate the job hash then to discard them.

As most jobs have many visible file this will significantly increase the speed of job eviction and significantly decrease the size of the database

V-FEXrt commented 1 month ago

Do we need to do any work to ensure older clients that are still sending the visible files over don't do anything weird?

The API actually still requires that clients send the visible files as part of the request so that we can correctly calculate the hash. Visible files are still part of the key we just don't need to store them in the database, storing the key (hash) is sufficient

Also did you do any profiling on how much this helps evictions?

No, I did see a performance increase on insertion though. I'll run some profiling on eviction

You don't mention it but it seems like this would also speed up job insertion since there is less to keep track of.

Ah yeah, I 100% meant to mention that

Also git grep shows that there is some more code to be removed in rust/rsc/src/bin/rsc/(types|main).rs

See the first response, we need to keep those because we need to keep the visible files as part of the request, just not part of the database

V-FEXrt commented 1 month ago

Oh wow! Yeah a quite large speed up. 20.9seconds vs 1.3 seconds for 344 jobs

test=# select count(*) from job;
 count 
-------
   344
(1 row)

test=# EXPLAIN ANALYZE DELETE FROM job;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Delete on job  (cost=0.00..44.39 rows=0 width=0) (actual time=0.758..0.759 rows=0 loops=1)
   ->  Seq Scan on job  (cost=0.00..44.39 rows=339 width=6) (actual time=0.004..0.060 rows=344 loops=1)
 Planning Time: 0.044 ms
 Trigger for constraint fk-visible_file-job: time=19673.332 calls=344
 Trigger for constraint fk-output_file-job: time=1216.633 calls=344
 Trigger for constraint fk-output_file-job: time=5.373 calls=344
 Trigger for constraint fk-output_file-job: time=70.707 calls=344
 Trigger for constraint fk-job-use-job: time=2.451 calls=344
 Execution Time: 20969.768 ms
(9 rows)

test=# select count(*) from job;
 count 
-------
     0
(1 row)

vs

test=# select count(*) from job;
 count 
-------
   344
(1 row)

test=# EXPLAIN ANALYZE DELETE FROM job;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Delete on job  (cost=0.00..46.54 rows=0 width=0) (actual time=0.845..0.846 rows=0 loops=1)
   ->  Seq Scan on job  (cost=0.00..46.54 rows=354 width=6) (actual time=0.006..0.092 rows=344 loops=1)
 Planning Time: 0.061 ms
 Trigger for constraint fk-output_file-job: time=1245.098 calls=344
 Trigger for constraint fk-output_file-job: time=4.393 calls=344
 Trigger for constraint fk-output_file-job: time=66.585 calls=344
 Trigger for constraint fk-job-use-job: time=1.672 calls=344
 Execution Time: 1318.919 ms
(8 rows)

test=# select count(*) from job;
 count 
-------
     0
(1 row)