timotheeg / nestrischamps

A web-based OCR and restreamer system for NES Classic Tetris players
MIT License
45 stars 11 forks source link

Script to clean S3 bucket #140

Closed timotheeg closed 1 year ago

timotheeg commented 1 year ago

We sometimes have files started in S3 but without having an entry in DB to reference it.

This PR introduces a script to crawl through the S3 bucket and remove files which are not being referenced.

The script uses the batch API from the aws-sdk, so it's "reasonably" fast, can crawl over 200k entries in a few minutes.

Sample output:

[...]
List Query 185 from 1dNysG3z+yVnrDjgzVL8p6QAORYjo0ZiHTxV6sZNKlWasZ8aAT7qrGHjpMF//weiE4VSuhmKEtiLMZKXDoFuAM8OQx1lFcRVX
Found 7 unlisted files (243/250)
Found 3 unlisted files (247/250)
Found 11 unlisted files (239/250)
Found 4 unlisted files (246/250)
Scheduled for deletion 681
List Query 186 from 1E1bvAgjIppYbZc3kc9IAu1AvW3CQVDZmULLHv34XpJTAAVQrFozCQRuiZC0nVEmvvYBbAVjQYtJ87LPBcoXXiRsQucmrHmG7
Found 9 unlisted files (241/250)
Found 54 unlisted files (196/250)
Found 83 unlisted files (167/250)
Found 10 unlisted files (240/250)
Scheduled for deletion 837
List Query 187 from 1PuG9T2TFPge8p6aJAr2lCzCkdHBkXX+AlWVUy2sZVuONx08Jmej/0zamBWpCxZgTJ//TFHZO6uRx6rBUJGQBcVlhzc40KlHq
Found 22 unlisted files (228/250)
Found 26 unlisted files (224/250)
Found 41 unlisted files (209/250)
Found 63 unlisted files (187/250)
Scheduled for deletion 989
List Query 188 from 1u+pVgiKPDbJc63qNkjwjmHaKgu+uD8qMJ2oVp0mq/vvWbS7YJLoxyEWrgQdmKeLkSMZhU0UgwDnetcoF9CEo6vDSFFX8G48D
Found 23 unlisted files (227/250)
Deleting 1000 to restitute 1505121
Found 8 unlisted files (242/250)
Found 9 unlisted files (241/250)
Found 27 unlisted files (223/250)
Scheduled for deletion 56
List Query 189 from 1BKVXifMTiF8J9lfNfFismbZpOtNpUPKq6w4L0n40Xgs1zt8BZD78FlWXugWne+y4JFmhJCZw9E2XLfE31l7L7NzxcpUJWT2r
Found 0 unlisted files (250/250)
Found 0 unlisted files (250/250)
Found 0 unlisted files (250/250)
Found 1 unlisted files (249/250)
Scheduled for deletion 57
[...]