ncihtan / HTAN-data-curator

HTAN Data Ingest Shiny App
https://sagebio.shinyapps.io/HTAN-data-curator/
Apache License 2.0
1 stars 0 forks source link

Reinstate files accidentally moved in table transfer #28

Closed aclayton555 closed 2 years ago

aclayton555 commented 2 years ago

Describe the bug Filing this ticket among multiple outages (Adam, Mialy, Milen). This stems from discussion on the dcc_operations HTAN Slack channel following this time: https://htanworkspace.slack.com/archives/C03E345AMLK/p1661186303347379 In summary, it looks like quite a few of the Vanderbilt center data files are missing (see example of this empty folder, however, it is not clear if this is the full extent of files "missing." @clarisse-lau spot checked a few single cell files and they appear to still be in the bucket e.g. s3://htan-dcc-vanderbilt/3398375/c207257b-c84a-49f8-ac2b-2c08e7acb2ee/3247-AS-3-ACAGTG_S3_R1_001.fastq.gz. Of note, the error message says 'no access' rather than something like ‘doesn’t exist.' Looking into the issue further, @adamjtaylor determined that It looks like the "missing" files have been swept up into the HTAN Entity Archive project that was meant to be the destination for empty folder entities used to store clinical and biospecimen information in our transfer to tables.

Action Required The entiityIDs and pointers to the s3 bucket have not changed so the id’s in the manifest just need to be moved into the right synapse folder. Can FAIR please help with re-instating these? Looks like @mialy-defelice performed the most recent modifications to the manifest, but please triage accordingly given Mialy's outage.

Priority (select one)

vthorsson commented 2 years ago

Is there progress on finding a route/person to work on this one @milen-sage @aclayton555 ? Sounds like the problem has been identified and hopefully no data files lost - but even an outward appearance of missing data can (justifiably) be concerning to people

adamjtaylor commented 2 years ago

I'm back from leave and will move the Vanderbilt files that I can identify back as a first fix. We can then more rigorously identify any other files that were archived and make sure this does not happen in subsequent moves of projects clinical and biospecimen records to tables.

adamjtaylor commented 2 years ago

I have a notebook setup to search manifests for out of place files and move them back. Unfortunately it looks like the entity archive project was setup with only @mialy-defelice as admin with @milen-sage and I only having download and not move privileges.

Unfortunately the file reinstatement will have to wait until @mialy-defelice is back from leave on Monday 29th

vthorsson commented 2 years ago

Thanks @adamjtaylor !

milen-sage commented 2 years ago

Reaching out to the Synapse team to check how we can add me and @adamjtaylor as admins on HTAN Entity Archive project.

milen-sage commented 2 years ago

@mialy-defelice changed access permissions on the project - we are good to go. Thanks @mialy-defelice!!

adamjtaylor commented 2 years ago

Thanks @mialy-defelice! I will run the scripts to check for and move missing files later this morning. @vthorsson I will let you know when complete.

vthorsson commented 2 years ago

Thanks @mialy-defelice and @milen-sage . I am still not seeing the files though.

Screen Shot 2022-08-30 at 12 58 44
vthorsson commented 2 years ago

Ah thanks @adamjtaylor for doing the additional step needed

adamjtaylor commented 2 years ago

Confirmed that I can move back in place. Functions used archived in this Gist

I believe it is only Vanderbilt that is effected by this issue.

I will update this comment as Vanderbilt entities are moved back into place [Now complete]

vthorsson commented 2 years ago

Thanks @adamjtaylor !

adamjtaylor commented 2 years ago

Thanks @vthorsson. This is now complete for all of Vanderbilt's manifests that relate to files rather than records.

I'll close this issue - @milen-sage if you could add a comment as to where the fix in the transfer script is being tracked that would be great.

milen-sage commented 2 years ago

Fix tracked here - we are working on it this sprint.