sul-dlss / dor-services-app

A Rails application exposing Digital Object Registry functions as a RESTful HTTP API
https://sul-dlss.github.io/dor-services-app/
Other
3 stars 2 forks source link

Do not include files with a filename pattern like '.nfsXXXXX' in the bag exported to preservation #4450

Closed andrewjbtw closed 1 year ago

andrewjbtw commented 1 year ago

We have been running into issues where files with names like '.nfs00000000097b892a00000325' are treated as if they are files that should be deposited into preservation. This means that the files are placed in the bag created at `/dor/export/{druid}' and included in bagit manifests:

85e2549c08830e85e89b12868f624d42 data/content/sc0487_1995_040_B6_TalkingStories_08001.tif
8ff225843f70a7f53e58b8dcc325a09a data/content/sc0487_1995_040_B6_TalkingStories_08001.pdf
0aaf3be17363afa532425b43c45aa5ae data/content/sc0487_1995_040_B6_TalkingStories_08001.xml
b183e88ca9dde028db4b3119e47343e3 data/content/sc0487_1995_040_B6_TalkingStories_08002.tif
0902b1a08e574773eea8cb11d53fb5cf data/content/sc0487_1995_040_B6_TalkingStories_08002.pdf
11c1ddd11d6dc5bf2b5cfd6d3bface0c data/content/sc0487_1995_040_B6_TalkingStories_08002.xml
ea4e211a07cc9237d431c5c3238fbbea data/content/kf609yz9250.pdf
8c9f3c14f896418dd6c329c426ac34ee data/metadata/.nfs00000000097b892a00000325  <----------- the problem file
5964ffd2ced0b8196db013019236cefe data/metadata/cocina.json
d1be5acdcf11313bf91fa86a4e0b515c data/metadata/contentMetadata.xml
234ce46cec2f1d61034e7c3e38981879 data/metadata/versionMetadata.xml

This appears to be causing the transfer-bag step to fail, though the error message says only

Robots::SdrRepo::PreservationIngest::ItemError: Error transferring bag (via dor_services@dor-services-worker-prod-a.stanford.edu) for druid:kf609yz9250: Transfering bag for druid:kf609yz9250 to preservation failed. STDOUT =

Based on which items this affects - Goobi deposits - I think what is happening is this:

  1. Goobi sends files to SDR, including a "stubContentMetadata.xml" file that SDR uses to create structural metadata
  2. The stubContentMetadata.xml file is placed in the "metadata" folder for the druid
  3. Workflow step takes the stubContentMetadata.xml file and generates Cocina
  4. Workflow step deletes the stubContentMetadata.xml file as it has served its purpose
  5. The NFS "lazy-delete" function leaves behind a '.nfsXXXXX' file in the metadata folder (see lazy-delete FAQ)
  6. Workflow step creates the bag to be exported to preservation
  7. The bag created in /dor/export/{druid} includes all files in the metadata folder, including '.nfsXXX'
  8. transfer-object sends the bag to preservation
  9. But the .nfsXXX file finally gets deleted
  10. transfer-object fails because a file has disappeared

Based on prior Slack discussion, we think the simplest approach to solving this at this point is to exclude the .nfsXXX file from the bag and thus from the export to preservation: https://stanfordlib.slack.com/archives/C04D8DJDRJ9/p1677862646686569

It's not clear how to prevent the '.nfsXXX' files from being created. Potentially that would be solved by changing how Goobi sends data to SDR so that it doesn't create a stubContentMetadata.xml file in the first place.

andrewjbtw commented 1 year ago

Re-opening because this occurred again today and I'm pretty sure the linked PR has been deployed, which should have prevented it.

peetucket commented 1 year ago

Can you provide the druid for the object that occurred again for troubleshooting?

andrewjbtw commented 1 year ago

Yes - it's https://argo.stanford.edu/view/druid:zb779kk9308 I may end up remediating it if I hear from the depositor but I'm leaving it alone for now.

peetucket commented 1 year ago

I looked at the metadata folder /dor/workspace/zb/779/kk/9308/zb779kk9308/metadata on dor-services-app-prod and do not see any of those .nfs files there. I have a feeling rerunning the step will work, though I cannot figure out what caused it to error the first time since I don't see any specific error messages anywhere (unless you have a lead). Maybe the file got cleaned up in the meantime? Though it shouldn't have mattered, unless the code is still buggy. I could try and add one again manually and then verify the code I wrote ignores it.

andrewjbtw commented 1 year ago

The problem is that the file gets included in the bag in the /dor/export folder, which means that the system expects it to be sent to preservation. When the file gets deleted, it means the bag can't be valid and accessioning will always fail until the bag is regenerated.

peetucket commented 1 year ago

Ok, I see it here, must still be a problem then.

/dor/export/zb779kk9308/data/metadata

peetucket commented 1 year ago

I see the problem, my regex filtering these files out was too restrictive and did not account for the fact that letters can be present in the filenames (and not just numbers). I am adjusting the regex and test and will have a PR fix up shortly. The currently busted druid will probably need to be remediated (i.e. destroy the current bag in /dor/export) and then let it be recreated. Since the .nfs is no longer in the source metadata folder, you don't even need to wait for this fix to do that.

andrewjbtw commented 1 year ago

Thanks! I sent this object back through bag generation and it's now accessioned.