mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.02k stars 340 forks source link

tiffslide with CLAM #169

Closed mpapadomanolaki closed 1 year ago

mpapadomanolaki commented 1 year ago

Hello,

I have some Whole Slide Images and I want to use CLAM for a Multiple Instance Learning training. I want to read the images with tiffslide, so in the scripts, I have replaced all 'import openslide' with 'import tiffslide as openslide'. However I have a problem with the dataloader which I think arises from how tiffslide reads the image regions. Specifically, in line 37 the num_workers are set to 4. I noticed that if the num_workers are 0 or 1, it works without errors but it is slow. When num_workers>1 then I get the below error for the dataloader. The weird thing is that for the same image, the error doesn't always happen.

Any feedback appreciated.

File "/gpfs/workdir/papadomama/GR_scripts/explore_CLAM/create_patches/CLAM/extract_features_fp.py", line 124, in output_file_path = compute_w_loader(h5_file_path, output_path, wsi, File "/gpfs/workdir/papadomama/GR_scripts/explore_CLAM/create_patches/CLAM/extract_features_fp.py", line 48, in compute_w_loader for count, (batch, coords) in enumerate(loader): File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise raise exception imagecodecs._jpeg8.Jpeg8Error: Caught Jpeg8Error in DataLoader worker process 2. Original Traceback (most recent call last): File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/gpfs/workdir/papadomama/GR_scripts/explore_CLAM/create_patches/CLAM/datasets/dataset_h5.py", line 158, in getitem img = self.wsi.read_region( File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tiffslide/tiffslide.py", line 386, in read_region arr: npt.NDArray[np.int] = get_zarr_selection( File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tiffslide/_zarr.py", line 193, in get_zarr_selection return grp[str(level)][selection] File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 807, in getitem result = self.get_basic_selection(pure_selection, fields=fields) File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 933, in get_basic_selection return self._get_basic_selection_nd(selection=selection, out=out, File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 976, in _get_basic_selection_nd return self._get_selection(indexer=indexer, out=out, fields=fields) File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 1267, in _get_selection self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection, File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 1966, in _chunk_getitem cdata = self.chunk_store[ckey] File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/storage.py", line 724, in getitem return self._mutable_mapping[key] File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tifffile/tifffile.py", line 11308, in getitem return self._getitem(key) File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tifffile/tifffile.py", line 11973, in _getitem chunk = keyframe.decode( File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tifffile/tifffile.py", line 7736, in decode_jpeg data_array: numpy.ndarray = imagecodecs.jpeg_decode( File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/imagecodecs/imagecodecs.py", line 995, in jpeg_decode raise exc File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/imagecodecs/imagecodecs.py", line 966, in jpeg_decode return imagecodecs.jpeg8_decode( File "imagecodecs/_jpeg8.pyx", line 332, in imagecodecs._jpeg8.jpeg8_decode imagecodecs._jpeg8.Jpeg8Error: Unsupported marker type 0x81

mpapadomanolaki commented 1 year ago

Got an answer here: https://github.com/bayer-science-for-a-better-life/tiffslide/issues/57