Error in calling `trackpy.batch` with multi-frame TIFF file

jacopoabramo commented 2 years ago

Greetings,

I'm currently trying to use trackpy to localize some particles in a TIFF video recording. I'm using PIMS because I'm also applying a preprocessing function before calling the localization. Using trackpy.locate does no problem, but when I try to call it as a batch on the entire video I get the following stack trace:

Traceback (most recent call last):
  File "C:\git\iScatRepoContainer\data_analysis\iScat_particle_analysis.py", line 260, in <module>
    calculate_particle_track(video,
  File "C:\git\iScatRepoContainer\data_analysis\iScat_particle_analysis.py", line 129, in calculate_particle_track
    batch_df : pd.DataFrame = tp.batch(
  File "C:\Users\iScat_Lab\AppData\Roaming\Python\Python39\site-packages\trackpy\feature.py", line 557, in batch
    for i, features in enumerate(map_func(curried_locate, frames)):
  File "C:\Users\iScat_Lab\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 870, in next
    raise value
tifffile.tifffile.TiffFileError: TiffPage 18: corrupted tag list at offset 320002345

My multiframe TIFF is of size [4000, 200, 200]. Is there anything I can do to fix this?

Thanks.

nkeim commented 2 years ago

Can this problem be isolated to trackpy? I.e. is there a problem if you access all the frames like this:

for frame in pimsreader:
    np.sum(frame.flat)

You would want to try it with and without the preprocessing.

jacopoabramo commented 2 years ago

Hi @nkeim,

Thanks for the quick reply. Here is a snippet of the code I just tested follow your suggestion:

import pims
import numpy as np
import imgrvt as rvt
from dataclasses import dataclass
from time import time

@dataclass(frozen=True)
class RVTSettings:
    min_radius : int = 2
    max_radius : int = 25
    rvt_type : str = "normalized"
    highpass : int = 5
    coarse_factor : int = 1
    coarse_mode : str = "add"
    pad_mode : str = "constant"

@pims.pipeline
def apply_rvt(input_frame: np.ndarray, 
            rvt_settings: RVTSettings) -> np.ndarray:

        """ Converts the input frame to float32 and applies
        Radial Variance Transform.
        """

        input_frame = input_frame.astype(np.float32)
        start = time()
        frame_rvt = rvt.rvt(input_frame, 
                    rvt_settings.min_radius, 
                    rvt_settings.max_radius, 
                    rvt_settings.rvt_type, 
                    highpass_size=rvt_settings.highpass, 
                    coarse_factor=rvt_settings.coarse_factor,
                    coarse_mode=rvt_settings.coarse_mode,
                    pad_mode=rvt_settings.pad_mode)
        print(f"RVT execution: {time() - start}")
        return frame_rvt

filename = "C:\\git\\iScatRepoContainer\\test_data\\event2_MF_avF.tif"

for frame in apply_rvt(pims.open(filename), rvt_settings=RVTSettings()):
    np.sum(frame.flat)

RVT stands for Radial Variance Transform.

The code works fine if sequencing the frames one by one, with or without apply_rvt. I'm suspecting that something happens when calling batch. The RVT call does not cause this issue, because I tested the main script without using the RVT pipeline call and I still get the same error.

nkeim commented 2 years ago

Thanks. I have 2 more guesses:

batch() attempts to use the frame_no attribute of each image, if it exists. Could you add something like if hasasttr(frame, 'frame_no'): print(frame.frame_no) to your loop and see if that triggers the exception?
This is a problem with multiprocessing. Use the processes=1 argument to batch().

nkeim commented 2 years ago

One bonus guess: batch() actually both iterates over frames, and uses an index to access them:

        for i, features in enumerate(map_func(curried_locate, frames)):
            image = frames[i]
            if hasattr(image, 'frame_no') and image.frame_no is not None:
                frame_no = image.frame_no

Could there be a problem with using an index to access all the frames in your Tiff file?

jacopoabramo commented 2 years ago

batch() attempts to use the frame_no attribute of each image, if it exists. Could you add something like if hasasttr(frame, 'frame_no'): print(frame.frame_no) to your loop and see if that triggers the exception?

Just tried this first option within the loop of the snippet above; the attribute exists.

This is a problem with multiprocessing. Use the processes=1 argument to batch().

But this would defeat the purpose of using batch() alltogether, wouldn't it?

Could there be a problem with using an index to access all the frames in your Tiff file?

I don't see why, just tried out and calling print(video[0]) works fine.

jacopoabramo commented 2 years ago

Hi @nkeim, can you confirm then that this is a problem related to Windows OS as mentioned here? Because I use too Windows. I should have probably mentioned it earlier, apologies.

lleeming commented 1 year ago

Just in case anyone comes across this thread in the future, I had this issue on macOS and running batch() with the processes=1 argument fixed the issue for me.

soft-matter / trackpy

Error in calling `trackpy.batch` with multi-frame TIFF file #700