mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.11k stars 362 forks source link

Eval.py giving UnboundLocalError: local variable 'data' referenced before assignment #132

Closed DeepLearningCoder closed 2 years ago

DeepLearningCoder commented 2 years ago

I was able to run train my models successfully with main.py but am now trying to use eval.py on an external dataset. I'm getting an error stated as UnboundLocalError: local variable 'data' referenced before assignment

Is there something I'm entering wrong in the terminal command or eval.py script?

Here is what I put for eval.py lines 73-82:

if args.task == 'task_1_tumor_vs_normal':
    args.n_classes=2
    dataset = Generic_MIL_Dataset(csv_path = 'dataset_csv/test.csv',
                                  data_dir= os.path.join(args.data_root_dir, 'Features'),
                                  shuffle = False,
                                  print_info = True,
                                  label_dict = {'normal_tissue':0, 'tumor_tissue':1}, 
                                  patient_strat=False,
                                  ignore=[])

Here are my inputs and outputs:

python eval.py --drop_out --k 10 --models_exp_code task_1_tumor_vs_normal_CLAM_80_s1 --save_exp_code task_1_tumor_vs_normal_CLAM_80_s1_cv --task task_1_tumor_vs_normal --model_type clam_sb --results_dir results --data_root_dir "C:\Users\user\PycharmProjects\melanoma_pathology_clam\slide images"

{'task': 'task_1_tumor_vs_normal', 'split': 'test', 'save_dir': './eval_results\\EVAL_task_1_tumor_vs_normal_CLAM_80_s1_cv', 'models_dir': 'results\\task_1_tumor_vs_normal_CLAM_80_s1', 'model_type': 'clam_sb', 'drop_out': True, 'model_size': 'small'}
label column: label
label dictionary: {'normal_tissue': 0, 'tumor_tissue': 1}
number of classes: 2
slide-level counts:
 0    12
1     8
Name: label, dtype: int64
Patient-LVL; Number of samples registered in class 0: 12
Slide-LVL; Number of samples registered in class 0: 12
Patient-LVL; Number of samples registered in class 1: 8
Slide-LVL; Number of samples registered in class 1: 8
Init Model
CLAM_SB(
  (attention_net): Sequential(
    (0): Linear(in_features=1024, out_features=512, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.25, inplace=False)
    (3): Attn_Net_Gated(
      (attention_a): Sequential(
        (0): Linear(in_features=512, out_features=256, bias=True)
        (1): Tanh()
        (2): Dropout(p=0.25, inplace=False)
      )
      (attention_b): Sequential(
        (0): Linear(in_features=512, out_features=256, bias=True)
        (1): Sigmoid()
        (2): Dropout(p=0.25, inplace=False)
      )
      (attention_c): Linear(in_features=256, out_features=1, bias=True)
    )
  )
  (classifiers): Linear(in_features=512, out_features=2, bias=True)
  (instance_classifiers): ModuleList(
    (0): Linear(in_features=512, out_features=2, bias=True)
    (1): Linear(in_features=512, out_features=2, bias=True)
  )
  (instance_loss_fn): CrossEntropyLoss()
)
Total number of parameters: 790791
Total number of trainable parameters: 790791
Init Loaders
{'task': 'task_1_tumor_vs_normal', 'split': 'test', 'save_dir': './eval_results\\EVAL_task_1_tumor_vs_normal_CLAM_80_s1_cv', 'models_dir': 'results\\task_1_tumor_vs_normal_CLAM_80_s1', 'model_type': 'clam_sb', 'drop_out': True, 'model_size': 'small'}
label column: label
label dictionary: {'normal_tissue': 0, 'tumor_tissue': 1}
number of classes: 2
slide-level counts:  
 0    12
1     8
Name: label, dtype: int64
Patient-LVL; Number of samples registered in class 0: 12
Slide-LVL; Number of samples registered in class 0: 12
Patient-LVL; Number of samples registered in class 1: 8
Slide-LVL; Number of samples registered in class 1: 8
Traceback (most recent call last):
  File "eval.py", line 135, in <module>
    model, patient_results, test_error, auc, df  = eval(split_dataset, args, ckpt_paths[ckpt_idx])
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\utils\eval_utils.py", line 53, in eval
    patient_results, test_error, auc, df, _ = summary(model, loader, args)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\utils\eval_utils.py", line 89, in summary
    del data
UnboundLocalError: local variable 'data' referenced before assignment
fedshyvana commented 2 years ago

try specifying --split all

I think the issue is that the eval script by default looks for the "test" split in the csv file you provide. For external test set, where the you need to evaluate on cases in the entire csv file, specifying --split all will use the whole dataset.

DeepLearningCoder commented 2 years ago

Now I'm getting an error that says it can't find image1.pt but I don't have anything named image1.pt in the dataset csv. Here is my output:

Traceback (most recent call last):
  File "eval.py", line 135, in <module>
    model, patient_results, test_error, auc, df  = eval(split_dataset, args, ckpt_paths[ckpt_idx])
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\utils\eval_utils.py", line 53, in eval
    patient_results, test_error, auc, df, _ = summary(model, loader, args)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\utils\eval_utils.py", line 70, in summary
    for batch_idx, (data, label) in enumerate(loader):
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\dataloader.py", line 652, in __next__
    data = self._next_data()
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\dataloader.py", line 1373, in _process_data
    data.reraise()
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\_utils.py", line 461, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\_utils\worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\datasets\dataset_generic.py", line 340, in __getitem__
    features = torch.load(full_path)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\serialization.py", line 699, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\user\\PycharmProjects\\melanoma_pathology_clam\\slide images\\Features\\pt_files\\image1.pt'
fedshyvana commented 2 years ago

but you have something named image1? it's looking for the corresponding features for each image in your csv file (.pt file)

DeepLearningCoder commented 2 years ago

I don't have any file named image1 and it is not listed in any csv file that I've used. I'm not sure why it's looking for an image1

fedshyvana commented 2 years ago

are you sure? is "dataset_csv/test.csv" perhaps not the right file then?

DeepLearningCoder commented 2 years ago

yes, I've looked through the entire test.csv. I could email you my test.csv file too if that would be helpful

fedshyvana commented 2 years ago

sure if you email me i can take a quick look

DeepLearningCoder commented 2 years ago

Thanks! Sorry actually I just noticed that I was working on a file called test.csv.csv. My new computer still has file extensions hidden and it was actually looking at a different file actually named test.csv. Would you be able to help me with one more error? It's telling me that it can't find one of the images but I'm sure it's there this time. I reached this error as well when I tried to manually put this in the eval column of the split file when training with main.py

Traceback (most recent call last):
  File "eval.py", line 135, in <module>
    model, patient_results, test_error, auc, df  = eval(split_dataset, args, ckpt_paths[ckpt_idx])
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\utils\eval_utils.py", line 53, in eval
    patient_results, test_error, auc, df, _ = summary(model, loader, args)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\utils\eval_utils.py", line 70, in summary
    for batch_idx, (data, label) in enumerate(loader):
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\dataloader.py", line 652, in __next__
    data = self._next_data()
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\dataloader.py", line 1373, in _process_data
    data.reraise()
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\_utils.py", line 461, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\_utils\worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\datasets\dataset_generic.py", line 340, in __getitem__
    features = torch.load(full_path)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\serialization.py", line 699, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "C:\Users\user\PycharmProjects\melanoma_pathology_clam\melanoma_pathology_clam\lib\site-packages\torch\serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\user\\PycharmProjects\\melanoma_pathology_clam\\slide images\\Features\\pt_files\\YMTA3671_31779.pt'
fedshyvana commented 2 years ago

i mean - it's basically saying "C:\Users\user\PycharmProjects\melanoma_pathology_clam\slide images\Features\pt_files\YMTA3671_31779.pt" does not exist - can you confirm? I don't think there's much i can do to help if it's errors due to filename parsing/locating files. To me it seems the code should be behaving as expected...

Otherwise I'm happy to look into things if you suspect any strange behavior with the code.

DeepLearningCoder commented 2 years ago

Thank you, I understand. I just confirmed that the file does exist. Do you think it have anything to do with the slashes are being read as double slashes? If not, maybe I'll just try reprocessing the images or something.

fedshyvana commented 2 years ago

yeah could be - im not familiar with high file paths on windows are interpreted. if it helps: https://github.com/mahmoodlab/CLAM/blob/1a92ef234411b44ec9ba27551307aea1143b5b4e/datasets/dataset_generic.py#L339 is where the file path is built and used to find the .pt file. You could try to use this exact line outside the dataset class/eval script to see if you can correctly locate the .pt file in question and modify the code/filenames accordingly if necessary.