stefmolin / exif-stripper

An easy-to-use tool to ensure image metadata (EXIF data and extended attributes) is removed.
https://pypi.org/project/exif-stripper/
Apache License 2.0
16 stars 5 forks source link

[BUG] Changing the files even if the exif is empty #56

Open Lee-W opened 4 days ago

Lee-W commented 4 days ago

Required attestation

Describe the bug Exif-stripper cleans and saves image files even when there is no EXIF data to remove, which causes the pre-commit hook to fail.

To Reproduce

# Change the directory to some place for testing
cd /tmp
mkdir test-exif-stripper
cd test-exif-stripper

# set up git for the project
git init

# [manual step] move some images to test-exif-stripper directory

git add .
pre-commit install
git cmt -m "add new images"

The pre-commit keeps showing

strip-exif....................................................................Failed
- hook id: strip-exif
- exit code: 1

Stripped metadata from ...

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

image

Environment

Additional context

This is due to https://github.com/stefmolin/exif-stripper/blob/d387c8b3be7b6df4d9ff6da005a1081caadf1476/src/exif_stripper/__init__.py#L31. It does not return None even if there's no data.

github-actions[bot] commented 4 days ago

It looks like this is your first issue here – welcome! Please familiarize yourself with the contributing guidelines, if you haven't already.

stefmolin commented 4 days ago

Hi @Lee-W, nice to hear from you 😊

Your example mentions moving some images, but you don't provide any examples for me to test. I have also not seen this happen on my machine, so I really need a reproducible example here.

If you see the pre-commit failure after adding the images, remember that you need to do git add . after the hook changes them as well. Otherwise, pre-commit stashes the changes that exif-stripper made (which are unstaged) and then it will continue to fail (because it is still running on the original file).


I'm also not seeing an issue with the line you reference. You can confirm this by running the following test, which succeeds because the EXIF is falsey:

def test_empty_exif(tmp_path):
    image_file = tmp_path / 'test.png'
    with Image.new(mode='1', size=(2, 2)) as im:
        im.save(image_file)
    assert not im.getexif()

While the EXIF class does not have a __bool__() method, Python also checks __len__() (see "Truth Value Testing" in the Python docs):

By default, an object is considered true unless its class defines either a __bool__() method that returns False or a __len__() method that returns zero, when called with the object.

Lee-W commented 4 days ago

Hi @Lee-W, nice to hear from you 😊

Thank you for your kind words about commitizen on the podcast! πŸ™Œ

Your example mentions moving some images, but you don't provide any examples for me to test. I have also not seen this happen on my machine, so I really need a reproducible example here.

Any image would failed on my machine πŸ€” (a bit more detail below)

If you see the pre-commit failure after adding the images, remember that you need to do git add . after the hook changes them as well. Otherwise, pre-commit stashes the changes that exif-stripper made (which are unstaged) and then it will continue to fail (because it is still running on the original file).

I'm also not seeing an issue with the line you reference. You can confirm this by running the following test, which succeeds because the EXIF is falsey:

def test_empty_exif(tmp_path):
    image_file = tmp_path / 'test.png'
    with Image.new(mode='1', size=(2, 2)) as im:
        im.save(image_file)
    assert not im.getexif()

While the EXIF class does not have a __bool__() method, Python also checks __len__() (see "Truth Value Testing" in the Python docs):

By default, an object is considered true unless its class defines either a __bool__() method that returns False or a __len__() method that returns zero, when called with the object.

Ah, you're right πŸ€¦β€β™‚οΈ By the time I saw the object was returned, I thought this was the root cause, which turns out it is not πŸ€¦β€β™‚οΈ I tested it a bit more and found out it's actually due to https://github.com/stefmolin/exif-stripper/blob/d387c8b3be7b6df4d9ff6da005a1081caadf1476/src/exif_stripper/__init__.py#L43-L44. It's likely a mac specific thing. Did a quick survey https://apple.stackexchange.com/questions/450118/what-is-the-com-apple-provenance-xattr-extended-attribute-and-how-can-i-dele. It's even possibly a terminal-specific thing πŸ€” (I'm using iTerm), but even I change it to terminal.app it still fails. Will did into it a bit more and see what's happening

Lee-W commented 3 days ago

I just verified on my machine this com.apple.provenance xattr is something that cannot be removed by xattr lib on macOS. I also tried the commands mentioned in the article. Still cannot remove it

stefmolin commented 3 days ago

Hmm, I have the same setup as you (macOS and iTerm). From what I gathered, it is related to the permissions you give to iTerm (see https://github.com/stefmolin/exif-stripper/issues/4#issuecomment-2179642698). Another user mentioned having an issue with a different attribute that I also wasn't able to replicate, so I want to do more research on extended attributes before proceeding.

Lee-W commented 3 days ago

Hmm, I have the same setup as you (macOS and iTerm). From what I gathered, it is related to the permissions you give to iTerm (see #4 (comment)). Another user mentioned having an issue with a different attribute that I also wasn't able to replicate, so I want to do more research on extended attributes before proceeding.

I tried to research a bit, but it seems to be an internal thing without an explicit document. (or simply I just did not find it πŸ₯²) But instead of skipping some tags like what I do now in https://github.com/stefmolin/exif-stripper/pull/57. I'm thinking of another solution.

The has_changed value here and here does not actually reflect whether the value has been changed (in most case yes), but whether the value has been cleaned. If there's any value that cannot be cleaned due to some reason, the has_changed value would still be true while it probably should not be. I'm thinking of comparing the list before and after cleaning the attributed and set this value. If the value is False, but the attribute is still not empty, we can raise a warning. WDYT?

stefmolin commented 2 days ago

That sounds like a better idea, but let's only do it for the extended attributes (xattr). The current behavior will result in being unable to commit images in which we can't remove the EXIF metadata, and that is a behavior I want to preserve.

Lee-W commented 2 days ago

That sounds like a better idea, but let's only do it for the extended attributes (xattr). The current behavior will result in being unable to commit images in which we can't remove the EXIF metadata, and that is a behavior I want to preserve.

sounds good! just removed the exif part and do it on xattr only