keiserlab / plaquebox-paper

Repo for Tang et al, bioRxiv 454793 (2018)
MIT License
36 stars 25 forks source link

Step 2.1 FileNotFoundError: [Errno 2] No such file or directory #1

Closed MonliH closed 4 years ago

MonliH commented 4 years ago

When running 2.1) CNN Models - Model Training and Development.ipynb on my Ubuntu 18.04 machine, in the 12th cell, I get the following error:

FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "<ipython-input-5-8e25f5046706>", line 32, in __getitem__
    img_as_img = Image.open(self.img_path + single_image_name)
  File "/home/user/.local/lib/python3.7/site-packages/PIL/Image.py", line 2766, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'data/seg/size_filtered/blobs/NA4757-02_AB/NA4757-02_AB_13_24_27.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "<ipython-input-5-8e25f5046706>", line 34, in __getitem__
    img_as_img = Image.open(NEGATIVE_DIR + single_image_name)
  File "/home/user/.local/lib/python3.7/site-packages/PIL/Image.py", line 2766, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'data/seg/negatives/NA4757-02_AB/NA4757-02_AB_13_24_27.jpg'

This is most likely due to a faulty train.csv file or missing images in the data/seg/size_filtered/blobs/ directory. Reviewing the code, the error above is a result of the code attempting to load an image that does not exist:

 def __getitem__(self, index):
    # Get label(class) of the image based on the cropped pandas column
    single_image_label = self.labels[index]
    raw_label = self.raw_labels[index]
    # Get image name from the pandas df
    single_image_name = str(self.data_info.loc[index,'imagename'])
    # Open image
    try:
        img_as_img = Image.open(self.img_path + single_image_name)
    except:
        img_as_img = Image.open(NEGATIVE_DIR + single_image_name)
    # Transform image to tensor
    if self.transform is not None:
        img_as_img = self.transform(img_as_img)
    # Return image and the label
    return (img_as_img, single_image_label, raw_label, single_image_name)

Although I am not sure of the exact cause of this problem. I am trying to reproduce your results with your code, any ideas?

mjke commented 4 years ago

thanks for giving this a shot, glad to hear it's of interest. I'll defer to @ZiqiTang919 for a more in-depth reply, but meanwhile just to check, did you encounter this error despite having already downloaded the accompanying image dataset from zenodo?

MonliH commented 4 years ago

Thanks for the reply. Yes, I have downloaded (also unzipped and copied to /data folder) the data from the zenodo archive.

ZiqiTang919 commented 4 years ago

Have you run the notebooks 1.1) - 1.3) to generate all images which should be located in the data/seg/size_filtered/blobs/ directory?

MonliH commented 4 years ago

Yes, I have run those notebooks (I even tried rerunning them when it didn't work). The data/seg/size_filtered/blobs/ directory is filled with sub-directories that have images in them.

ZiqiTang919 commented 4 years ago

Can you check whether the image_details.csv file contains the information of the image NA4757-02_AB_13_24_27.jpg?

MonliH commented 4 years ago

Searching in the image_details.csv, I found NA4757-02_AB_13_24_27.jpg. It has the following:

NA4757-02_AB_13_24_27.jpg,NA4757-02_AB,24,13,[ 271 1118  256  256],[ 388 1232   22   28],408
ZiqiTang919 commented 4 years ago

It seems that the plaque detection part is different from our results so this image is filtered out due to the small plaque size. Can you check whether the versions of the packages in your environment are the same as we list? Especially for libopencv, opencv, py-opencv, and pyvips.

MonliH commented 4 years ago

Wow, it looks like the pyvips version is 2.1.8 (should be 2.1.2) and I don't even have the libopencv and the py-opencv module. By opencv, I'm guessing you mean opencv-python, which I have (also wrong version).

EDIT: Perhaps you py-opencv is opencv-python

MonliH commented 4 years ago

Although the libopencv, py-opencv, opencv modules don't actually exist i.e. python3.7 -m pip install libopencv doesn't work.

ZiqiTang919 commented 4 years ago

We recommend creating a new Anaconda (https://www.anaconda.com/) environment and then install the dependencies using conda. For example, you may install using

conda install -c anaconda py-opencv
MonliH commented 4 years ago

Ok, thanks for the help. I will try this again (with the right dependencies) and get back to you then!

MonliH commented 4 years ago

Hi, would I need to get the 8.2.2-1 version of the libvips library for this to work? or just the correct pyvips version? Because the libvips binaries for that version are not on the package repository anymore (replaced with new ones) and building from source is cumbersome (its not working for me). Is there a way you can provide the already filtered and cropped files instead of the whole slide images?

ZiqiTang919 commented 4 years ago

Yes, in the Zenodo release there is a folder named tiles which contains the filtered images we used in the study. To use that you may need to change the DATA_DIR in the second cell of the notebook 2.1). Also, please note that in the tiles directory negative images are not stored in a separate folder, instead, they are named with a prefix 'neg_'.

MonliH commented 4 years ago

When I use the Tiles folder with DATA_DIR = 'data/Tiles/train/' and NEGATIVE_DIR = 'data/Tiles/train/', along with adding the neg_ to the file directory when needed:

try:
    img_as_img = Image.open(self.img_path + single_image_name)
except:
    img_as_img = Image.open(NEGATIVE_DIR + "/neg_".join(single_image_name.split("/")))

With the above code, I get through the 12th and 13th cell, but not long after training I get another file not found error:

FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "<ipython-input-5-40e972fd8855>", line 32, in __getitem__
    img_as_img = Image.open(self.img_path + single_image_name)
  File "/home/user/.local/lib/python3.7/site-packages/PIL/Image.py", line 2766, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'data/Tiles/train/NA_4896_02_AB17-24/NA_4896_02_AB17-24_5_25_4.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/user/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "<ipython-input-5-40e972fd8855>", line 34, in __getitem__
    img_as_img = Image.open(NEGATIVE_DIR + "/neg_".join(single_image_name.split("/")))
  File "/home/user/.local/lib/python3.7/site-packages/PIL/Image.py", line 2766, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'data/Tiles/train/NA_4896_02_AB17-24/neg_NA_4896_02_AB17-24_5_25_4.jpg'
ZiqiTang919 commented 4 years ago

There are four WSIs that are in the validation folder. NA_4896_02_AB17 is one of them. You may need to move these images to the train folder.

MonliH commented 4 years ago

Sorry for the delayed response, that got it working!