lsst-uk / csd3-echo-somerville

Code to backup from CSD3 to Echo S3, curate at STFC cloud and expose to Somerville
Apache License 2.0
0 stars 0 forks source link

SLOW processing - many small subfolders #71

Closed davedavemckay closed 1 month ago

davedavemckay commented 1 month ago

Where many folders (1000s) sit under a common parent folder, they are processed into zips individually. This is OK if the contents are 100s of MiB, but very slow if contents are 10s of KiB, which is common. These should be collated into a single zip one folder up the tree, meaning 1 upload instead of 1000s.

Look at lines 872 to 1040 in lsst-backup.py - extend to_collate dict to include total_size for each parent_folder and parent_parent_folder - this will allow parent_folders with common parent_parent_folders, that are small, to be aggregated.