Closed andrewjbtw closed 3 months ago
For consideration: Generate cocina early so that it is validated and catches problems.
To test this:
.
├── manifest.csv
└── xs951nf4814
├── file.txt
└── xs951nf4814
└── hello.txt
(In this example the druid is the name of a folder within the content but the outcome is the same whether it is a folder or a file.)
Run Preassembly.
See that Preassembly shows an error.
Check the /dor/assembly
filesystem. If files are left behind then it means Preassembly ran part way before the cocina validation kicked in:
/dor/assembly/xs/951/nf/4814
└── xs951nf4814
├── content
│ ├── file.txt
│ └── xs951nf4814
│ └── hello.txt
└── metadata
@andrewjbtw should no files or folder include the druid? Is druid.pdf
OK for example? Or is it just folders that shouldn't have the druid?
Thanks.
There's no problem with a folder or filename including the druid, as long as the name includes more than just the druid. So druid.pdf
, druid.tif
, etc. are all ok.
The problem is specifically when a file or folder name exactly matches the druid. The type of failure is different when it's a file or a folder, but both cases should be disallowed.
It's unlikely for someone to name a file just druid
with no extension, but if they do it's a problem.
Thanks for the clarification, that helps :-)
Related to https://github.com/sul-dlss/cocina-models/issues/732
Support for both the old and new Stacks file layouts means that we can't allow the root content folder of a deposit to contain a folder or file whose name is identical to the item's druid. We should catch this in Preassembly so that items do not get caught in accessioning.
My suggestion is that we follow the strategy already in place to prevent using file hierarchies with non-file content types. I don't remember exactly how that was implemented but it prevents an item from starting accessioning if it breaks the validation rule for hierarchical files. I don't think this is a Cocina-level validation.
Note that for preassembly, most users stage their content using this pattern:
That staging file layout uses the druid as the name of the folder that serves as the container used to carry files into the SDR. This is permitted because that "container" is discarded and only the content files within the container are added to the druid.
What we need to prevent would look like this:
In that layout, the container has a folder (or file) named for the druid within the content itself. If this item got shelved, it would create a name collision.