jkunze / bagitspec

31 stars 11 forks source link

manifest filename with newline #2

Closed edsu closed 10 years ago

edsu commented 10 years ago

Over in the bagit-python repository we had a issue opened regarding a validation error for a newly created bag. The issue was tracked down to filenames that had an embedded carriage return (0x0d) in them, which made their way into the manifest, and ultimately disrupted validation.

One approach would be to prevent the creation of bags with filenames that have embedded CR, LF or CRLF. This would involve throwing an exception or error during bag creation. Another would be to allow these filenames to exist in the manifest, but to take care to encode them in such a way that doesn't disturb the line oriented format of the manifest.

I think it's in the spirit of BagIt to do the latter, accepting that some filesystems allow CR, LF and CRLF to be present in the filename.