rfjakob / cshatag

Detect silent data corruption under Linux using sha256 stored in extended attributes
MIT License
244 stars 22 forks source link

MacOS: Lots of errors are thrown when run on ExFAT drive #23

Open Ken0sis opened 2 years ago

Ken0sis commented 2 years ago

Background: I'm first time user of cshatag, so please let me know if this is just user error. I am on M1 Mac and have an external hard drive that's ExFAT formatted

Description of Error:

  1. cshatag seems to run fine when creating checksum using cshatag -recursive, and I can see the the user.shatag.ts and user.shatag.sha256 attributes are written to files when I check using xattr -l.
  2. However, when I run cshatag -recursive -qq then I see lots of errors because of files that start with ._, because it says "operation not permitted". I tried to remove these dot files to see if cshatag can still check for errors, but that also removes the extended attributes.
  3. cshatag -qq runs fine if I check just one file.

Question:

  1. It looks like the extended attributes are written to dot files, and these dot files are throwing errors when cshatag runs a check. Is this a problem because I'm using ExFAT drive on a Mac?
  2. Maybe there's a setting that I've overlooked?

Thanks for helping/explaining how I should be using cshatag

duncanbarth commented 2 years ago

Not the developer, but you're right in narrowing it down to ExFAT on a Mac.

cshatag needs extended attributes, which ExFAT does not support. MacOS has a compatibility layer where, on filesystems like ExFAT without xattr support, it will store them in a 'sidecar' file with the same name, prefixed by . The consequence of this is it won't allow you to read/write extended attributes on any files with a . prefix on these filesystems.

So in your example - the first time you run cshatag in recursive mode, it writes out the timestamp/checksum information to the extended attributes of all the files and Macos then creates the . files to store them. The next time you run cshatag in recursive mode, it sees the . files and tries to read the extended attributes on them, and then fails, since MacOS doesn't support xattrs on these sidecar files.

Given that the ._ format is proprietary to MacOS (other operating systems don't know/care that they are storing extended attributes for other files, so cshatag on a different platform would not be able to check the attributes), the easiest fix would be to just use a drive formatted as HFS+ (aka 'Mac OS Extended') or APFS.

If that's not going to work, your next best bet is to not use the recursive tag in cshatag, and instead run cshatag individually on the non ._ files (potentially automated via 'find' or the like)

Theoretically, cshatag could be modified to ignore . files when running in recursive mode when running on MacOS and when the filesystem doesn't natively support extended attributes (on filesystems with support, MacOS will handle xattrs on files prefixed with . ). However, I'm not the developer and this seems like it could get pretty messy.

Ken0sis commented 2 years ago

@duncanbarth Thanks for sharing what you know about this. I would like to hear the developer's thoughts on this as well. As far as I can tell, it's the . files that are causing errors to be thrown because cshatag does not have permission to edit them. I've already switched my hard drive data format over to APFS, and it does seem to resolve the problem. However, I think a more sustainable solution for the cshatag project would be to ignore . files when running integrity checks, because the user doesn't care about the integrity of these ._ files. This would give cshatag more universal appeal, so that users are less restricted to what kind of file format they use.

There one strange behavior I'm observing though. Even after I've converted over to APFS, cshatag would still throw error at some . files that are generated by spotlight, which is not a big deal since there's not a lot of them. However, this behavior is inconsistent, and sometimes it doesn't throw error at all because it's able to actually generate a SHA tag for these . files as well. This sounds like a good thing, but it makes me wonder if cshatag is to tag those ._ files in ExFAT all along, but I didn't use it right.

duncanbarth commented 2 years ago

The whole MacOS ._ workaround for filesystems without extended attributes support is a bit of a hack, has caveats you've run into, and to some degree breaks cross platform compatibility. But all this is a function of MacOS, not cshatag; cshatag requires extended attribute support and expects the operating system to be able to read/write extended attributes.

I believe ignoring the . files by default is a bad idea. In most cases (i.e. under Linux or MacOS with an xattr supported filesystem), . files are perfectly valid, support extended attributes, and it would be reasonable for end users to expect them to be processed like any other file. Given the whole point of the tool is validating file integrity, silently ignoring a class of files is problematic.

As an alternate solution, I could see potentially adding a new command line option to cshatag. Something like '-skipdotbar' so that files beginning with ._ are ignored and reported to stdout via a new file status of \<skipped>. Making it a option also avoids having to put in a bunch of OS/filesystem detection code into cshatag.

rfjakob commented 2 years ago

Hi, I think https://github.com/rfjakob/cshatag/issues/23#issuecomment-1031962572 is spot-on.

Thinking about what to do. Maybe not printing errors for ._ files on MacOS with -qq.

rfjakob commented 2 years ago

PS: Detecting MacOS is easy, but detecting the filesystem under MacOS is kind of messy, so I would prefer to not do that :)