irods / irods_capability_automated_ingest

Other
12 stars 15 forks source link

scanner.py and sync_irods.py refactor #211

Open avkrishnamurthy opened 12 months ago

avkrishnamurthy commented 12 months ago

In #209, multiread and multiwrite from S3 into iRODS was implemented. In this process, the idea of using the scanner object from scanner.py to do the upload and syncing of files was suggested. This would make sense, as earlier in the process a scanner object is already created that knows the type of files/directory need to be ingested (S3/local FS), and would avoid some repeated work in checking if a file is from S3 or on the local fs. Some initial steps were taken to do this, and currently the functionality for uploading files is routed through the scanner, for both S3 and normal filesystem files. However, one issue with this implementation is that the iRODS things have now been pulled into the scanner rather than being separated as they were before, and there is a risk of tangled dependencies. A scanner object is passed from the tasks in sync_task.py to the functions in sync_irods.py. It should be possible to avoid this middle step when doing the upload and it may be the case that the sync_irods.py is no longer needed.