jakeret / tf_unet

Generic U-Net Tensorflow implementation for image segmentation
GNU General Public License v3.0
1.9k stars 748 forks source link

Dst tensor is not initialized. How replace feed_dict with input pipeline #227

Closed meredithmjackson closed 5 years ago

meredithmjackson commented 5 years ago

I would like to run this project on satellite images and am running into a memory error (disguised as 'InternalError: Dst tensor is not initialized.') due to the size of the images. I have downloaded the repository and am trying to edit the code to suit my needs. I am working with Sentinel-2 data and the images are type uint16 with shape (10980, 10980) and have 4 channels (R,G,B & NIR). The images are in .tif file format and I am using GDAL (instead of cv2 or PIL) to open the files and convert to numpy array in float32.

Specifically, I am looking for help with substituting feed_dict with an input pipeline, such as Dataset API. I am somewhat new to Tensorflow so any help and guidance is appreciated.

InternalError Traceback (most recent call last) ~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, args) 1277 try: -> 1278 return fn(args) 1279 except errors.OpError as e:

~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata) 1262 return self._call_tf_sessionrun( -> 1263 options, feed_dict, fetch_list, target_list, run_metadata) 1264

~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata) 1349 self._session, options, feed_dict, fetch_list, target_list, -> 1350 run_metadata) 1351

InternalError: Dst tensor is not initialized. [[Node: _arg_x_0_1/_5 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_456__arg_x_0_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] [[Node: results/pixel_wise_softmax/truediv/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_458_results/pixel_wise_softmax/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

InternalError Traceback (most recent call last)

in () 4 epochs=1, 5 dropout=0.4, ----> 6 display_step=2) ~\AppData\Roaming\Python\Python35\site-packages\tf_unet\unet.py in train(self, data_provider, output_path, training_iters, epochs, dropout, display_step, restore, write_graph, prediction_path) 421 422 test_x, test_y = data_provider(self.verification_batch_size) --> 423 pred_shape = self.store_prediction(sess, test_x, test_y, "_init") 424 425 summary_writer = tf.summary.FileWriter(output_path, graph=sess.graph) ~\AppData\Roaming\Python\Python35\site-packages\tf_unet\unet.py in store_prediction(self, sess, batch_x, batch_y, name) 461 prediction = sess.run(self.net.predicter, feed_dict={self.net.x: batch_x, 462 self.net.y: batch_y, --> 463 self.net.keep_prob: 1.}) 464 pred_shape = prediction.shape 465 ~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata) 875 try: 876 result = self._run(None, fetches, feed_dict, options_ptr, --> 877 run_metadata_ptr) 878 if run_metadata: 879 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) ~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata) 1098 if final_fetches or final_targets or (handle and feed_dict_tensor): 1099 results = self._do_run(handle, final_targets, final_fetches, -> 1100 feed_dict_tensor, options, run_metadata) 1101 else: 1102 results = [] ~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata) 1270 if handle is None: 1271 return self._do_call(_run_fn, feeds, fetches, targets, options, -> 1272 run_metadata) 1273 else: 1274 return self._do_call(_prun_fn, handle, feeds, fetches) ~\Anaconda3\envs\geoML\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args) 1289 except KeyError: 1290 pass -> 1291 raise type(e)(node_def, op, message) 1292 1293 def _extend_graph(self): InternalError: Dst tensor is not initialized. [[Node: _arg_x_0_1/_5 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_456__arg_x_0_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] [[Node: results/pixel_wise_softmax/truediv/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_458_results/pixel_wise_softmax/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
jakeret commented 5 years ago

Hi Meredith, this sounds like a very nice project! The images are indeed extremely large. Wondering if the network would fit into your GPU's memory with the entire resolution event if you use the Dataset API. Have you considered to run the training / prediction on image crops and then reassembling them?

meredithmjackson commented 5 years ago

I was trying to avoid doing that, but that may have to be the solution. I reduced the images by a factor of 25, and removed the NIR channel so they are shape (2196,2196,3) and it is working now.