jackyko1991 / vnet-tensorflow

Implementation of vnet in tensorflow for medical image segmentation
143 stars 67 forks source link

tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence #13

Closed jefflgaol closed 4 years ago

jefflgaol commented 4 years ago

Hi, there! Amazing work you have here. But I have a question. I tried to run your main.py like this:

$ python3 main.py --config_json config.json --gpu 1

Unfortunately, the terminal showed several issues:

...
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[{{node IteratorGetNext}}]]
...
OutOfRangeError (see above for traceback): End of sequence
     [[node IteratorGetNext (defined at /home/jeff/vnet-tensorflow/model.py:327) ]]
...

So, I tried to print the the path produced by def input_parser from NiftiDataset3D like this:

def input_parser(self,case):
        case = case.decode("utf-8")
        image_paths = []
        for channel in range(len(self.image_filenames)):
            image_paths.append(os.path.join(self.data_dir,case,self.image_filenames[channel]))
            print(os.path.join(self.data_dir,case,self.image_filenames[channel]))

and the result is also fine:

/home/jeff/vnet-tensorflow/data/training/case1/img.nii.gz
/home/jeff/vnet-tensorflow/data/training/case3/img.nii.gz
/home/jeff/vnet-tensorflow/data/training/case2/img.nii.gz

Do you have any insights for these issues? Notes: Currently, I am using Tensorflow v.1.13.1.

jackyko1991 commented 4 years ago

Check if you have any data in testing folder

jefflgaol commented 4 years ago

I also have data in testing and evaluation folders.

jackyko1991 commented 4 years ago

can you post the config.json for me to take a look?

this error refers to tensorflow dataset api can process the data properlly, if possible, paste the full error output log

jefflgaol commented 4 years ago

This is my config.json:

{
    "ProjectName": "VNet Tensorflow",
    "ProjectDetail": {
        "BodyPart": "Liver",
        "Diseases": "Lesion"
    },    
    "TrainingSetting": {
        "Data": {
            "TrainingDataDirectory":"./data/training",
            "TestingDataDirectory": "./data/testing",
            "ImageFilenames": ["img.nii.gz"],
            "LabelFilename": "label.nii.gz"
        },
        "Restore": true,
        "SegmentationClasses": [0,1,2],
        "LogDir": "./tmp/log",
        "CheckpointDir": "./tmp/ckpt",
        "BatchSize": 32,
        "PatchShape": [256,256,32],
        "ImageLog": false,
        "Testing": false,
        "TestStep": 30,
        "Epoches": 99999,
        "MaxIterations": 15000,
        "LogInterval": 25,
        "Networks": {
            "Name":"VNet",
            "Dropout": 0.01
        },
        "Loss": "weighted_sorensen",
        "Optimizer":{
            "Name": "Adam",
            "InitialLearningRate": 1e-2,
            "Momentum":0.9,
            "Decay":{
                "Factor": 0.99,
                "Steps": 100
            }
        },
        "Spacing": [0.75,0.75,0.75],
        "DropRatio": 0.01,
        "MinPixel":30
    },
    "EvaluationSetting":{
        "Data":{
            "EvaluateDataDirectory": "./data/evaluate",
            "ImageFilenames": ["img.nii.gz"],
            "LabelFilename": "label.nii.gz",
            "ProbabilityFilename": "probability_tf.nii.gz"
        },
        "CheckpointPath": "./tmp/ckpt/checkpoint-0",
        "Stride": [256,256,32],
        "BatchSize": 1,
        "ProbabilityOutput":false
    }
}

and this is my log:

python3 main.py --config_json  config.json --gpu 1
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2020-03-21 20:47:38.062457: Reading configuration file...
2020-03-21 20:47:38.062556: Reading configuration file complete
2020-03-21 20:47:38.062581: Start to build model graph...
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
iterator
WARNING:tensorflow:From /home/cwlab913/vnet-tensorflow/NiftiDataset3D.py:45: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

2020-03-21 20:47:38.093575: Dataset pipeline complete
2020-03-21 20:47:38.093926: Core network complete
WARNING:tensorflow:From /home/cwlab913/vnet-tensorflow/networks.py:259: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.batch_normalization instead.
2020-03-21 20:47:40.752396: Output layers complete
2020-03-21 20:47:40.789966: Loss function complete
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/metrics_impl.py:1472: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-03-21 20:47:40.884969: Metrics complete
2020-03-21 20:47:40.885004: Build graph complete
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-03-21 20:47:48.697176: Start training...
2020-03-21 20:47:48.697257: Setting up Saver...
2020-03-21 20:47:49.023796: Last checkpoint epoch: 0
2020-03-21 20:47:49.404576: Last checkpoint global step: 0
2020-03-21 20:47:52.088255: Epoch 1 starts...
2020-03-21 20:47:52.617702: Set network to training ok
./data/training/case1/img.nii.gz
./data/training/case5/img.nii.gz
./data/training/case3/img.nii.gz
./data/training/case2/img.nii.gz
./data/training/case4/img.nii.gz
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[{{node IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cwlab913/vnet-tensorflow/model.py", line 639, in train
    image, label = self.sess.run(self.next_element_train)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[node IteratorGetNext (defined at /home/cwlab913/vnet-tensorflow/model.py:327) ]]

Caused by op 'IteratorGetNext', defined at:
  File "main.py", line 83, in <module>
    main(args)
  File "main.py", line 75, in main
    model.train()
  File "/home/cwlab913/vnet-tensorflow/model.py", line 542, in train
    self.build_model_graph()
  File "/home/cwlab913/vnet-tensorflow/model.py", line 327, in build_model_graph
    self.next_element_train = self.train_iterator.get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 414, in get_next
    output_shapes=self._structure._flat_shapes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1685, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): End of sequence
     [[node IteratorGetNext (defined at /home/cwlab913/vnet-tensorflow/model.py:327) ]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 83, in <module>
    main(args)
  File "main.py", line 75, in main
    model.train()
  File "/home/cwlab913/vnet-tensorflow/model.py", line 698, in train
    print("{}: Training of epoch {} complete, epoch loss: {}".format(datetime.datetime.now(),epoch+1,loss_sum/count))
ZeroDivisionError: division by zero
jackyko1991 commented 4 years ago

I think you only get 5 images but your batch size is 32, this can't form one full image batch

change it to a smaller size like 1 or for it is a 3D training, it will consume quite a lot of GPU memory if batch size is too big.

jefflgaol commented 4 years ago

Gosh! I forgot to set that to the correct batch size. Thank you very much!