jannisborn / covid19_ultrasound

Open source lung ultrasound (LUS) data collection initiative for COVID-19.
https://www.mdpi.com/2076-3417/11/2/672
152 stars 80 forks source link

Errors when trying to test the model #103

Closed rafaelblevin821 closed 2 years ago

rafaelblevin821 commented 2 years ago

Hi, I'm running into an error when trying to test the model.

test.py: error: unrecognized arguments: [-h] [--data DATA] [--weights WEIGHTS] [--m_id M_ID] [--classes CLASSES] [--folds FOLDS] [--save_path SAVE_PATH]

I tried python scripts/test.py but got more errors as per @nickdnickd suggestion and got the following error:

------------- SPLIT  0 -------------------
2022-02-05 17:25:37.254420: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/david/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-02-05 17:25:37.256741: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-05 17:25:37.258574: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (JXVL203): /proc/driver/nvidia/version does not exist
2022-02-05 17:25:37.262249: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/david/covid19_ultrasound/pocovidnet/pocovidnet/evaluate_covid19.py", line 99, in __init__
    model.load_weights(path)
  File "/home/david/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/david/venv/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
    raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for trained_models/fold_0/best_weights/variables/variables

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/test.py", line 240, in <module>
    main()
  File "scripts/test.py", line 195, in main
    model = Evaluator(
  File "/home/david/covid19_ultrasound/pocovidnet/pocovidnet/evaluate_covid19.py", line 101, in __init__
    raise Exception('Error in model restoring.')
Exception: Error in model restoring.

I wonder is it necessary to have a dedicated GPU for this?

Any ideas on how I can rectify this? Thanks.

nickdnickd commented 2 years ago

Hey @rafaelblevin821 So I went back locally and tried to run it again on my end. This was the command that worked for me locally: python scripts/test.py --weights models/test --classes 3 --folds 1

Explanation:

Before running this try ls models/test to make sure that you have models there. I think it assumes 5 folds. Do you remember what you trained with?

rafaelblevin821 commented 2 years ago

Hi @nickdnickd, RE: python scripts/test.py --weights models/test --classes 3 --folds 1 So I ran the code and seem to have got results, but also some errors. Any help is greatly appreciated. Results:

------------- SPLIT  0 -------------------
2022-02-05 18:07:00.159015: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/david/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-02-05 18:07:00.160538: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-05 18:07:00.163429: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (JXVL203): /proc/driver/nvidia/version does not exist
2022-02-05 18:07:00.164869: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model restored. Class mappings are ['covid', 'pneumonia', 'regular']
testing on n_files: 625
              precision    recall  f1-score   support

       covid       0.60      0.39      0.48       147
   pneumonia       0.43      1.00      0.60       206
     regular       1.00      0.19      0.32       272

    accuracy                           0.51       625
   macro avg       0.68      0.53      0.47       625
weighted avg       0.72      0.51      0.45       625

Traceback (most recent call last):
  File "scripts/test.py", line 240, in <module>
    main()
  File "scripts/test.py", line 227, in main
    evaluate_logits(
  File "scripts/test.py", line 72, in evaluate_logits
    gt_s = saved_gt[s]
IndexError: list index out of range
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.iter
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.decay
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.0.total
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.0.count
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.1.total
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.1.count
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-12.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-12.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-13.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-13.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-14.gamma
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-14.beta
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-15.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-15.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-12.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-12.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-13.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-13.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-14.gamma
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-14.beta
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-15.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-15.bias
nickdnickd commented 2 years ago

yep! This is expected. It seems like with folds not equal to 5 you will run into this error. I have a 1 line change that enabled this that I can submit as a PR. It uses a heuristic only loop over the length of the logits instead of 5 by default.

rafaelblevin821 commented 2 years ago

yep! This is expected. It seems like with folds not equal to 5 you will run into this error. I have a 1 line change that enabled this that I can submit as a PR. It uses a heuristic only loop over the length of the logits instead of 5 by default.

Great, thank you. Please let me know when you have submitted the PR. :)

nickdnickd commented 2 years ago

Submitted the below PR. Even if it's not totally acceptable, you can get the idea for how to push through with only one fold:

https://github.com/jannisborn/covid19_ultrasound/pull/104/files

rafaelblevin821 commented 2 years ago

Submitted the below PR. Even if it's not totally acceptable, you can get the idea for how to push through with only one fold:

https://github.com/jannisborn/covid19_ultrasound/pull/104/files

Thank @nickdnickd. I updated the test.py with your fix and seem to have gotten more results, but still some errors, which may be unrelated to your code and possibly related to plugins on my laptop.

 ------------- SPLIT  0 -------------------
2022-02-05 18:31:12.619038: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/david/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-02-05 18:31:12.619849: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-05 18:31:12.621459: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (JXVL203): /proc/driver/nvidia/version does not exist
2022-02-05 18:31:12.624992: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model restored. Class mappings are ['covid', 'pneumonia', 'regular']
testing on n_files: 625
              precision    recall  f1-score   support

       covid       0.60      0.39      0.48       147
   pneumonia       0.43      1.00      0.60       206
     regular       1.00      0.19      0.32       272

    accuracy                           0.51       625
   macro avg       0.68      0.53      0.47       625
weighted avg       0.72      0.51      0.45       625

Average scores in cross validation:
           Precision  Recall  F1-score    MCC  Specificity  Accuracy  Balanced
covid          0.604   0.395     0.477  0.371        0.921     0.506     0.529
pneumonia      0.432   1.000     0.603  0.391        0.353     0.506     0.529
regular        1.000   0.191     0.321  0.343        1.000     0.506     0.529
Standard deviations:
           Precision  Recall  F1-score  MCC  Specificity  Accuracy  Balanced
covid            0.0     0.0       0.0  0.0          0.0       0.0       0.0
pneumonia        0.0     0.0       0.0  0.0          0.0       0.0       0.0
regular          0.0     0.0       0.0  0.0          0.0       0.0       0.0
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.iter
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.decay
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.0.total
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.0.count
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.1.total
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).keras_api.metrics.1.count
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-12.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-12.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-13.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-13.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-14.gamma
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-14.beta
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-15.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'm' for (root).layer_with_weights-15.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-12.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-12.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-13.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-13.bias
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-14.gamma
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-14.beta
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-15.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer's state 'v' for (root).layer_with_weights-15.bias
nickdnickd commented 2 years ago

@rafaelblevin821 Any time! I see these too and am also unsure.

jannisborn commented 2 years ago

Thanks @nickdnickd for proposing this fix!

@rafaelblevin821 the remaining entries in the log trace are warnings not errors. The stds are all zeros in the final evaluation since you only have one fold.