Closed GeorgeBohw closed 6 years ago
I am also faced with exactly this problem. Try reducing the number of training step (--training_number_of_steps
) helps. Note that you have to remove the checkpoints file before rerun, otherwise it will try to start from the previous checkpoint, resulting in the same result.
It seems that this is really due to limited GPU memory. As stated in the document, setting --fine_tune_batch_norm=False
will solve this problem. I tried setting this option and can be able to train with a training step of 30,000 now ;-)
@walkerlala I set --fine_tune_batch_norm=False and crop_size=1000,the same error occurs,so what's wrong with it,what's ur crop_size?I just want run the model on big size image like 720p,any other solution? Thanks!
@walkerlala I think i ca't fine-tuning the pre-trained model using image in higher solution, the only way i can do is to use the pre-trained model with image with higher image,and i don't know how to use.I seems that the pre-trained model just fits images with maxium size 513*513.
My crop size is 513*513
@walkerlala Thanks for ur response!
@GeorgeBohw Does that solve your problem?
@walkerlala Yeah,it solved the problem,thanks~
I am facing similar issue in SSD Inception V2 and can you help me how can I solve this? How can I set crop size or reduce training_number_of_steps or fine_tune_batch_norm? What exact solution worked for you?
@walkerlala I am also facing the same problem, but no solution did work. What should I do? Thanks!
@walkerlala I set --fine_tune_batch_norm=False and crop_size=1000,the same error occurs,so what's wrong with it,what's ur crop_size?I just want run the model on big size image like 720p,any other solution? Thanks!
delete all files in train_logdir dir, begin a new train, it will work
After I've executed the train command and ran into this error, changing training_number_of_steps and fine_tune_batch_norm did not help. I have quite radically deleted the folder pascal_voc_seg and redownloaded, unpacked and installed voc2012 and the deeplabv3_pascal_train_aug. Now, running train.py with changing training_number_of_steps and fine_tune_batch_norm got rid of the problem. However, ybxbupt measures might be sufficient.
@walkerlala Hello! my crop size is 257257, and it works. But if I use a bigger size, 481481 or 513*513, it won't work. Try reducing the number of training step (--training_number_of_steps) and (--learining_rate) no helps. What should I do next to improve my result? Thanks!
I have faced the same problem, have you sovled it please?@Adnation
INFO:tensorflow:Error reported to Coordinator: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d _5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance', defined at:
File "train.py", line 184, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2 d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]] Traceback (most recent call last): File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call return fn(*args) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_poi ntwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception yield File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\training\coordinator.py", line 495, in run self.run_loop() File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\training\supervisor.py", line 1035, in run_loop self._sv.global_step]) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_poi ntwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance', defined at:
File "train.py", line 184, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2 d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
INFO:tensorflow:Error reported to Coordinator: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d _5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance', defined at:
File "train.py", line 184, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2 d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]] Traceback (most recent call last): File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call return fn(*args) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_poi ntwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception yield File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\training\coordinator.py", line 495, in run self.run_loop() File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\training\supervisor.py", line 1035, in run_loop self._sv.global_step]) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_poi ntwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance', defined at:
File "train.py", line 184, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2 d_5_3x3_s2_128/BatchNorm/moving_variance [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T =DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3s2 128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]] [[Node: cond_7/one_hot/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/ job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_903_cond_7/one_hot", tensor_type=DT_INT32, _device="/ job:localhost/replica:0/task:0/device:GPU:0"]()]]
how do I solve the problem? every body can help me ?please!
@walkerlala Yeah,it solved the problem,thanks~
i meet the problem also, can you share how do you do to solve the problem, just like the method above mentioned? i did it, but failed again.
Hi, I want to train custom datasets using ssdMobileNet-V1 using Tensorflow-gpu 1.15. I am facing below issues for the same.
Relying on driver to perform ptx compilation. This message will be only logged once. 2022-02-04 10:42:44.152817: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 INFO:tensorflow:Saving checkpoint to path train/model.ckpt I0204 10:43:59.338148 140653890541312 supervisor.py:1117] Saving checkpoint to path train/model.ckpt INFO:tensorflow:Recording summary at step 0. I0204 10:44:34.073986 140653915719424 supervisor.py:1050] Recording summary at step 0. INFO:tensorflow:Error reported to Coordinator: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean [[node ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean (defined at /home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[FeatureExtractor/MobilenetV1/Conv2d_9_depthwise/BatchNorm/gamma/read/_1521]] (1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean [[node ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean (defined at /home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.
Original stack trace for 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean': File "train.py", line 186, in tf.app.run() File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/mlai/.local/lib/python3.7/site-packages/absl/app.py", line 303, in run _run_main(main, args) File "/home/mlai/.local/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(*args, *kwargs) File "train.py", line 182, in main graph_hook_fn=graph_rewriter_fn) File "/home/mlai/.local/lib/python3.7/site-packages/object_detection/legacy/trainer.py", line 353, in train model_var.op.name, model_var)) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram tag=tag, values=values, name=scope) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary "HistogramSummary", tag=tag, values=values, name=name) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(args, *kwargs) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack() Traceback (most recent call last): File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(args) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/mlai/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean [[{{node ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean}}]] [[FeatureExtractor/MobilenetV1/Conv2d_9_depthwise/BatchNorm/gamma/read/_1521]] (1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean [[{{node ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise/BatchNorm/moving_mean}}]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
How can I solve this issue? Please provide your valuable inputs for the same. So that I can continue.
Thanks and Regards, abhishek-ml-ai
Please go to Stack Overflow for help and support:
http://stackoverflow.com/questions/tagged/tensorflow
Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:
Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the problem
Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
When I run local_test.sh,i only modify --***_crop_size to 1000,then the error comes out:
**_INFO:tensorflow:Error reported to Coordinator: Nan in summary histogram for: image_pooling/BatchNorm/moving_variance_1
Caused by op u'image_pooling/BatchNorm/moving_variance_1', defined at: File "/home/george/project/deeplabv3/models-master/research/deeplab/train.py", line 347, in
tf.app.run()
File "/home/george/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/george/project/deeplabv3/models-master/research/deeplab/train.py", line 268, in main
summaries.add(tf.summary.histogram(model_var.op.name, model_var))
File "/home/george/anaconda2/lib/python2.7/site-packages/tensorflow/python/summary/summary.py", line 193, in histogram
tag=tag, values=values, name=scope)
File "/home/george/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 189, in _histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/home/george/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/george/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/george/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extractstack() # pylint: disable=protected-access**
What is the reason? 1000 is too large?If I want to use the model to test 1920*1080 size image,how can I do? I am looking forward to your response,thank you!
Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.