weigertlab / spotiflow

Accurate and efficient spot detection for microscopy data
https://weigertlab.github.io/spotiflow/
BSD 3-Clause "New" or "Revised" License
61 stars 7 forks source link

TIFF file error when running spotiflow-predict on folder on mounted storage #12

Closed lfra closed 4 months ago

lfra commented 4 months ago

Dear spotiflow-team, I encounter an error when running spotiflow-predict from the terminal on a folder with TIF files that is located on a remote storage location. The odd thing is when I run spotiflow-predict on a single TIF within that same folder, it works without issues.

Specific example spotiflow-predict /path/to/folder/myfile.tif >> works

spotiflow-predict /path/to/folder >> fails (error message below)

INFO:spotiflow.cli.predict:Spotiflow - version 0.3.2 INFO:spotiflow.model.spotiflow:Loading pretrained model general Traceback (most recent call last): File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/tifffile/tifffile.py", line 4261, in init byteorder = {b'II': '<', b'MM': '>', b'EP': '<'}[header[:2]] KeyError: b'\x00\x05'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/bin/spotiflow-predict", line 8, in sys.exit(main()) File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/spotiflow/cli/predict.py", line 62, in main images = [imread(img) for img in image_files] File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/spotiflow/cli/predict.py", line 62, in images = [imread(img) for img in image_files] File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/skimage/io/_io.py", line 60, in imread img = call_plugin('imread', fname, plugin=plugin, *plugin_args) File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/skimage/io/manage_plugins.py", line 217, in call_plugin return func(args, kwargs) File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/skimage/io/_plugins/tifffile_plugin.py", line 74, in imread return tifffile_imread(fname, kwargs) File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/tifffile/tifffile.py", line 1239, in imread with TiffFile( File "/Users/l.m.frank-2/opt/miniconda3/envs/spotiflow/lib/python3.9/site-packages/tifffile/tifffile.py", line 4263, in init raise TiffFileError(f'not a TIFF file {header!r}') from exc tifffile.tifffile.TiffFileError: not a TIFF file b'\x00\x05\x16\x07'

I already tried to see if the TIF files have a strange header/format, but everything seems alright (checked with "file myfile.tif" in terminal and it returns myfile.tif: TIFF image data, little-endian). As mentioned, when running it on a single file, everything works.

When I make a local copy of the folder, it runs normally on all files in the folder upon specifying the folder path - so, could it indeed be a problem on how spotiflow tries to find the TIFs in the folder on the mounted storage?

The storage is mounted via Mountain Duck (sftp). I can access and navigate through the folder without issues both via Finder and the terminal. I do not know if this helps, but this is the path Mountain Duck creates for the folder:

/Users/l.m.frank-2/Library/Group Containers/G69SCX94XU.duck/Library/Application Support/duck/Volumes.noindex/SFTP.xx.xx.localized/storage/myfolder

Thank you very much in advance!

Environment

AlbertDominguez commented 4 months ago

Hi! Hmm... I just tried it in some images of mine and it does work (although the filesystem is not mounted through sftp). The loading is done by another lib (skimage), which in turn resorts to another library (tifffile) to load TIFs, so I don't think it's an issue on our end. How many TIF files do you have in the directory? Could be that one of the files is corrupt and that's why it breaks (and by chance is not the one you are trying). In case you haven't checked all of them, you can try to load them with the following code (in a Python console or as a script, please change the img_dir variable so that it is the directory containing your images!):

from itertools import chain
from pathlib import Path

from skimage.io import imread

ALLOWED_EXTENSIONS = ("tif", "tiff", "png", "jpg", "jpeg") 

img_dir = Path("/YOUR/DATA/DIRECTORY/HERE")
image_files = sorted(
    tuple(chain(*tuple(img_dir.glob(f"*.{ext}") for ext in ALLOWED_EXTENSIONS)))
)
assert len(image_files) > 0, "No images found in the given folder" 
for fname in image_files:
    try:
        imread(fname)
    except Exception as e:
        print(f"Could not load image {fname}")
        raise e
print("All files read OK!")

it should print the filename that's not loading before raising the error, if there is one. If that's the case, then one of the files might be corrupt, or there might be some issue with the SFTP mount. Can you double check that? If all files can be normally loaded, then we can look deeper...

lfra commented 4 months ago

Thanks for your fast reply! I tested what you suggested and it gives an error for loading the first TIF in that folder (hin this example img1.tif). I am sure that this file is not corrupted, because I can run spotiflow-predict on that single file in the same folder just fine (when specifying the file path).

I noticed that in the cannot load error message the path to the file is somehow different/not correct. It somehow adds a "._", but the file is named "img1.tif"

Could not load image /storage/sample_data/chsplit/53BP1/._img1.tif

Edit Just found this https://www.reddit.com/r/learnpython/comments/sgj31e/pltimread_and_cv2imread_adding_to_the_beginning/?rdt=59388 Could be that these are hidden 'resource fork' metadata files in the MacOS file system it attempts to load and fails because they are no tifs. Unfortunately, I do not see them in that folder in Finder when I unhide the hidden files, but I can see these _. files in the terminal (with ls -a).

lfra commented 4 months ago

Found a fix! It is indeed not a spotiflow issue, but it is caused by the ._ resource fork files. I can use the "dotclean" command in the terminal on the folder to remove/merge these hidden files safely. When I then run spotiflow-predict on the cleaned up folder it works. It is not an ideal solution, because I always need to take care to remove these hidden . files for every folder before running spotiflow... Maybe you have a better idea.

AlbertDominguez commented 4 months ago

Makes sense, good catch! I just added an argument to the CLI to exclude hidden files and made a new release. You should reinstall (pip install -U spotiflow) and then run as you did before, but with an extra argument: spotiflow-predict ... --exclude-hidden-files. That should avoid trying to load the resource fork files!