Open sec122 opened 2 months ago
Hey team! Please add your planning poker estimate with Zenhub @bess @hectorcorrea @JaymeeH @leefaisonr
@sec122 In the case of 479 mentioned above, I see the only difference is with the README file, and I can see in the one starting with "globus_" shows some character encoding errors when viewed in a browser (for me, at least). If you don't see any substantial differences between the two in terms of content, then I'd recommend we go with the one starting "dataspace".
@sec122 As for 442, that's more mysterious. Here the options I see:
One way or another, RDSS will need clear guidance from PRDS about what to keep and what to delete (if anything).
@carolyncole Here are a set of items that just need the duplicate files and prefixes removed (see notes in our spreadsheet under the column "actions remaining" for specifics about each):
@sec122 Those updates are completed now.
Duplicate Files in Migrated Items
Expected behavior
Only one copy of each file should be present in migrated objects.
Actual behavior
Some files are duplicated in migrated datasets, easily noticed by filenames with a prefix of either "dataspace" or "globus"
These duplicated file cases fall into three groups: a) Only the README files are duplicated - 22 cases b)
All files are duplicated (and there are TAR files) - 6 casesIn a separate ticket #1920 ^ Matt has more info about how we want to handle these cases c) All files are duplicated (and there are no TAR files) - 1 caseSteps to replicate
View the full list of items (color coded in red) on the NEEDS ATTENTION tab of the "Copy of RDOS Records in DataSpace" google sheet https://docs.google.com/spreadsheets/d/130B7RMhnqSeTIKPFBdDsrSVbC1C_PCdZp0qwTucR0QA/edit?usp=sharing
Issue type = "Duplicate Files beyond Readmes" and "Duplicate Readmes Only" for specific examples and links to the records.
Impact of this bug
We cannot approve these datasets until the issue is fixed. Therefore, these records remain in DataSpace until the issue is resolved.
Honeybadger link and code snippet, if applicable
Implementation notes, if any
I believe @carolyncole may already have a script to take care of these issues - since she has fixed a very similar issue for us earlier in the migration. Unsure if it requires making a new similar script or rerunning the existing one though.
Acceptance criteria
https://datacommons.princeton.edu/describe/works/353No files matched the globus_https://datacommons.princeton.edu/describe/works/38moved to #1920https://pdc-describe-prod.princeton.edu/describe/works/425moved to #1920https://pdc-describe-prod.princeton.edu/describe/works/429moved to #1920https://pdc-describe-prod.princeton.edu/describe/works/431moved to #1920