This should probably be False, but that would require a lot of other changes.
Additionally, this would greatly improve the potential implementation of #282. As it stands, a query to hold all of the data objects under the target collection is required. This would mean that the entire S3 bucket is being held in memory (possibly - depends on the implementation of Minio.list_objects) and the entire target collection's contents as well, which could potentially be very large.
Currently, all S3 bucket syncs treat the entire bucket like a flat directory. While this is the nature of S3 buckets, treating "/" characters as individual "sub-folders" in the bucket could massively improve performance. The
Minio.list_objects
call in the S3 bucket task specifiesrecursive=True
: https://github.com/irods/irods_capability_automated_ingest/blob/ec34cb160e55b3d479c3a9796e5118721757f451/irods_capability_automated_ingest/tasks/s3_bucket_sync.py#L122This should probably be
False
, but that would require a lot of other changes.Additionally, this would greatly improve the potential implementation of #282. As it stands, a query to hold all of the data objects under the target collection is required. This would mean that the entire S3 bucket is being held in memory (possibly - depends on the implementation of
Minio.list_objects
) and the entire target collection's contents as well, which could potentially be very large.