Open edsu opened 8 years ago
Perhaps the spec doesn't need to specify as long as applications normalize both the path from the manifest and the path from the filesystem when comparing?
I think trying to mandate a normalization form would be hard but perhaps there should be a prominent guide for implementors? We could follow with enhancement requests for the known open source projects.
:+1: mandating seems hard (fruitless), but a note to implementors to pick one when comparing the filesystem paths against the manifest filenames seems like a good idea?
@edsu how does the proposed recommendation in https://github.com/loc-rdc/bagitspec/pull/1/ and especially https://github.com/loc-rdc/bagitspec/pull/1/commits/f898aff4ee89c441ee6931f708d942551ad549a4 sound?
The BagIt specification lets you specify that UTF-8 encoding be used in tag manifests. But it doesn't appear to assume a particular normalization form.
I have a problem where files are bagged and transferred from an OS X filesystem (which uses NFD) and are copied to Linux (which uses NFC). During validation the NFC normalized form from the filesystem is compared against the NFD normalized form from the manifest and validation fails.
Should a particular normalization form (NFC?) be assumed for unicode encodings?