Stream logs directly to S3 instead of storing them in the DB

rust-lang / crater

Run experiments across parts of the Rust ecosystem!

https://crater.rust-lang.org

635 stars 88 forks source link

Stream logs directly to S3 instead of storing them in the DB #295

Open pietroalbini opened 6 years ago

pietroalbini commented 6 years ago

The Crater database is growing larger since it stores all the logs in it before uploading them in batch to S3. Even deleting them wouldn't easily solve the problem, since the disk space is allocated anyway until a VACUUM, which is really expensive.

We should upload the logs to S3 as soon as we receive them.

ishitatsuyuki commented 5 years ago

This doesn't have to be streaming, but yes we should not store it in a RDBMS.

Not sure about the internal structures, but the process will be like:

Upload file to S3
Store a reference to it (filename) in database

pietroalbini commented 5 years ago

Yep, that was my plan. My only doubt is what should Crater do if the upload to S3 fails. At the moment this is not a problem, because all the logs are batch-uploaded to S3 during report generation, and if the upload fails the whole report generation fails, showing an error and allowing a manual retry.