Closed lxy-94 closed 7 years ago
Did you run train_sunnybrook.py
as is? Also, check to see if the function cv2.fillPoly
is correctly reading in the contour files from the Sunnybrook dataset by inserting some print lines. I'm using OpenCV 3.1
@vuptran I am having the same / similar problem. The result after the for loop at line 145 (in train_sunnybrook.py) is simply [nan, 0.0, nan, nan]
.
In response to your question about cv2.fillPoly
, yes that seems to work as intended.
Running on Linux Mint 18.2, 64bit. Some file rearrangements / renaming was necessary in the Sunnybrook dataset, but otherwise the code runs fine, but just gives 'nan' results as early as epoch 1.
I'll take a look at this. It didn't happen in my environment listed in the README.md.
In the meantime, could you reduce the learning rate specified in fcn_model.py
to see if nan
still appears?
Yes it does. I've tried down to lr = 0.00001
. Here's a typical run:
tasos@tasos-VanB ~/Desktop/cardiac-segmentation-master $ ./train_sunnybrook.py i 0
/usr/local/lib/python2.7/dist-packages/dicom/__init__.py:53: UserWarning:
This code is using an older version of pydicom, which is no longer
maintained as of Jan 2017. You can access the new pydicom features and API
by installing `pydicom` from PyPI.
See 'Transitioning to pydicom 1.x' section at pydicom.readthedocs.org
for more information.
warnings.warn(msg)
Using TensorFlow backend.
Mapping ground truth i contours to images in train...
Shuffling data
Number of examples: 260
Done mapping training set
Building Train dataset ...
Processing 234 images and labels ...
Building Dev dataset ...
Processing 26 images and labels ...
2017-09-04 05:11:25.356507: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-04 05:11:25.356530: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-04 05:11:25.356534: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-04 05:11:25.356537: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-04 05:11:25.356540: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Main Epoch 1
Learning rate: 0.000010
Train result ['loss', 'acc', 'dice_coef', 'jaccard_coef']:
[ nan 0.02561367 nan nan]
Evaluating dev set ...
26/26 [==============================] - 4s
Dev set result ['loss', 'acc', 'dice_coef', 'jaccard_coef']:
[ nan 0. nan nan]
Saving model weights to model_logs/sunnybrook_i_epoch_1.h5
Main Epoch 2
Learning rate: 0.000010
Train result ['loss', 'acc', 'dice_coef', 'jaccard_coef']:
[ nan 0. nan nan]
Evaluating dev set ...
26/26 [==============================] - 3s
Dev set result ['loss', 'acc', 'dice_coef', 'jaccard_coef']:
[ nan 0. nan nan]
Saving model weights to model_logs/sunnybrook_i_epoch_2.h5
Main Epoch 3
... etc for 40 epochs.
Environment:
Dear @vuptran
I have tried again using tensorflow-gpu
/ cuda
/ cudnn
(instead of the standard tensorflow installation available through pip), and I can confirm that I now get the intended results (though, interestingly enough, not an identical contour to the one generated from the example weights provided with the code -- I am right in thinking one simply copies the result from the last epoch into the weights folder and renames accordingly, right?).
Any idea why the code fails when run on the cpu? Is this a bug, or was your code written exclusively for gpu use? (I haven't spotted anything suggesting this in the code, but admittedly I haven't gone through it in that much detail).
I'm glad this works out. Computation between CPU and GPU should have been automatic in tensorflow. There is no explicit GPU declaration in the code. I noticed that tensorflow compilation is very specific. For example. tensorflow compiled on a VM with a K80 GPU does not properly import when it is copied over to a different VM using a different card.
The packaged weights for the Sunnybrook model were produced by training the model on the entire training set, not split into train/val sets. This improves the model slightly as it learns from more data.
Hello, I have reead your paper, and I try to train the sunnybrook dataset in my conputer after reading your tutorial, but after 40 epochs, the loss, accuracy and the jaccard coefficient are the same as 1 epoch, how can I do to increase my accuracy by train the Sunnybrook dataset? Thank you very much for your paper and tutorial, and I am looking forward to get your reply.