Closed CBCBCBCBCBCBCBCB closed 5 years ago
2018-10-14 15:22:48.265058: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ***x**** 2018-10-14 15:22:48.265078: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at conv_ops.cc:398 : Resource exhausted: OOM when allocating tensor with shape[64,672,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 2018-10-14 15:22:58.265240: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 13.29MiB. Current allocation summary follows. 2018-10-14 15:22:58.265356: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (256): Total Chunks: 1412, Chunks in use: 1406. 353.0KiB allocated for chunks. 351.5KiB in use in bin. 37.0KiB client-requested in use in bin. 2018-10-14 15:22:58.265382: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (512): Total Chunks: 263, Chunks in use: 251. 186.2KiB allocated for chunks. 178.5KiB in use in bin. 151.8KiB client-requested in use in bin. 2018-10-14 15:22:58.265391: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (1024): Total Chunks: 341, Chunks in use: 268. 507.8KiB allocated for chunks. 397.5KiB in use in bin. 346.6KiB client-requested in use in bin. 2018-10-14 15:22:58.265412: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (2048): Total Chunks: 346, Chunks in use: 336. 951.5KiB allocated for chunks. 923.5KiB in use in bin. 882.0KiB client-requested in use in bin. 2018-10-14 15:22:58.265433: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (4096): Total Chunks: 94, Chunks in use: 92. 564.0KiB allocated for chunks. 555.5KiB in use in bin. 546.3KiB client-requested in use in bin. 2018-10-14 15:22:58.265455: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (8192): Total Chunks: 102, Chunks in use: 102. 1.19MiB allocated for chunks. 1.19MiB in use in bin. 1.17MiB client-requested in use in bin. 2018-10-14 15:22:58.265462: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (16384): Total Chunks: 158, Chunks in use: 158. 3.33MiB allocated for chunks. 3.33MiB in use in bin. 3.30MiB client-requested in use in bin. 2018-10-14 15:22:58.265483: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (32768): Total Chunks: 65, Chunks in use: 63. 2.30MiB allocated for chunks. 2.20MiB in use in bin. 2.18MiB client-requested in use in bin. 2018-10-14 15:22:58.265503: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (65536): Total Chunks: 185, Chunks in use: 184. 17.11MiB allocated for chunks. 17.02MiB in use in bin. 17.01MiB client-requested in use in bin. 2018-10-14 15:22:58.265509: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (131072): Total Chunks: 50, Chunks in use: 50. 8.28MiB allocated for chunks. 8.28MiB in use in bin. 8.23MiB client-requested in use in bin. 2018-10-14 15:22:58.265529: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (262144): Total Chunks: 142, Chunks in use: 142. 60.96MiB allocated for chunks. 60.96MiB in use in bin. 60.96MiB client-requested in use in bin. 2018-10-14 15:22:58.265548: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (524288): Total Chunks: 25, Chunks in use: 25. 16.22MiB allocated for chunks. 16.22MiB in use in bin. 16.22MiB client-requested in use in bin. 2018-10-14 15:22:58.265554: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (1048576): Total Chunks: 147, Chunks in use: 147. 251.51MiB allocated for chunks. 251.51MiB in use in bin. 251.51MiB client-requested in use in bin. 2018-10-14 15:22:58.265581: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (2097152): Total Chunks: 25, Chunks in use: 23. 65.92MiB allocated for chunks. 59.80MiB in use in bin. 58.57MiB client-requested in use in bin. 2018-10-14 15:22:58.265588: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (4194304): Total Chunks: 184, Chunks in use: 180. 1.29GiB allocated for chunks. 1.26GiB in use in bin. 1.26GiB client-requested in use in bin. 2018-10-14 15:22:58.265606: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (8388608): Total Chunks: 160, Chunks in use: 160. 2.16GiB allocated for chunks. 2.16GiB in use in bin. 2.06GiB client-requested in use in bin. 2018-10-14 15:22:58.265625: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (16777216): Total Chunks: 18, Chunks in use: 18. 474.96MiB allocated for chunks. 474.96MiB in use in bin. 433.52MiB client-requested in use in bin. 2018-10-14 15:22:58.265645: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (33554432): Total Chunks: 18, Chunks in use: 18. 872.66MiB allocated for chunks. 872.66MiB in use in bin. 789.21MiB client-requested in use in bin. 2018-10-14 15:22:58.265652: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (67108864): Total Chunks: 9, Chunks in use: 9. 741.14MiB allocated for chunks. 741.14MiB in use in bin. 700.45MiB client-requested in use in bin. 2018-10-14 15:22:58.265674: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (134217728): Total Chunks: 3, Chunks in use: 3. 415.88MiB allocated for chunks. 415.88MiB in use in bin. 415.88MiB client-requested in use in bin. 2018-10-14 15:22:58.265682: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (268435456): Total Chunks: 2, Chunks in use: 2. 665.38MiB allocated for chunks. 665.38MiB in use in bin. 568.97MiB client-requested in use in bin. 2018-10-14 15:22:58.265688: I tensorflow/core/common_runtime/bfc_allocator.cc:646] Bin for 13.29MiB was 8.00MiB, Chunk State: 2018-10-14 15:22:58.265695: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x7f1336000000 of size 1280 2018-10-14 15:22:58.265700: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x7f1336000500 of size 256 2018-10-14 15:22:58.265705: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x7f1336000600 of size 256 2018-10-14 15:22:58.265710: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x7f1336000700 of size 256 2018-10-14 15:22:58.265715: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x7f1336000800 of size 256
############################################################################## There are lines similar to above lines with "Chunk at 0x7f1336000800 of size more big numbers like 13934592 here"
2018-10-14 15:22:58.301907: I tensorflow/core/common_runtime/bfc_allocator.cc:671] Summary of in-use Chunks by size: 2018-10-14 15:22:58.301918: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1406 Chunks of size 256 totalling 351.5KiB 2018-10-14 15:22:58.301925: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 39 Chunks of size 512 totalling 19.5KiB 2018-10-14 15:22:58.301931: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 212 Chunks of size 768 totalling 159.0KiB 2018-10-14 15:22:58.301936: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 7 Chunks of size 1024 totalling 7.0KiB 2018-10-14 15:22:58.301941: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 5 Chunks of size 1280 totalling 6.2KiB 2018-10-14 15:22:58.301947: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 255 Chunks of size 1536 totalling 382.5KiB 2018-10-14 15:22:58.301952: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 1792 totalling 1.8KiB 2018-10-14 15:22:58.301957: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 2048 totalling 4.0KiB 2018-10-14 15:22:58.301963: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 330 Chunks of size 2816 totalling 907.5KiB 2018-10-14 15:22:58.301969: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 4 Chunks of size 3072 totalling 12.0KiB 2018-10-14 15:22:58.301974: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 4352 totalling 25.5KiB 2018-10-14 15:22:58.301980: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 72 Chunks of size 6144 totalling 432.0KiB 2018-10-14 15:22:58.301985: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 14 Chunks of size 7168 totalling 98.0KiB 2018-10-14 15:22:58.301991: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 12 Chunks of size 8448 totalling 99.0KiB 2018-10-14 15:22:58.301997: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 9728 totalling 19.0KiB 2018-10-14 15:22:58.302002: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 10496 totalling 10.2KiB 2018-10-14 15:22:58.302008: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 75 Chunks of size 12288 totalling 900.0KiB 2018-10-14 15:22:58.302013: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 12 Chunks of size 16128 totalling 189.0KiB 2018-10-14 15:22:58.302019: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 8 Chunks of size 16640 totalling 130.0KiB 2018-10-14 15:22:58.302024: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 48 Chunks of size 16896 totalling 792.0KiB 2018-10-14 15:22:58.302030: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 4 Chunks of size 18944 totalling 74.0KiB 2018-10-14 15:22:58.302035: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 22272 totalling 21.8KiB 2018-10-14 15:22:58.302040: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 76 Chunks of size 24320 totalling 1.76MiB 2018-10-14 15:22:58.302046: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 20 Chunks of size 28416 totalling 555.0KiB 2018-10-14 15:22:58.302051: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 32256 totalling 31.5KiB 2018-10-14 15:22:58.302057: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 55 Chunks of size 33792 totalling 1.77MiB 2018-10-14 15:22:58.302062: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 44032 totalling 43.0KiB 2018-10-14 15:22:58.302068: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 56576 totalling 331.5KiB 2018-10-14 15:22:58.302080: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 64512 totalling 63.0KiB 2018-10-14 15:22:58.302090: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 8 Chunks of size 66048 totalling 516.0KiB 2018-10-14 15:22:58.302101: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 56 Chunks of size 67328 totalling 3.60MiB 2018-10-14 15:22:58.302111: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 120 Chunks of size 112896 totalling 12.92MiB 2018-10-14 15:22:58.302121: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 7 Chunks of size 131840 totalling 901.2KiB 2018-10-14 15:22:58.302132: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 39 Chunks of size 175872 totalling 6.54MiB 2018-10-14 15:22:58.302143: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 4 Chunks of size 225792 totalling 882.0KiB 2018-10-14 15:22:58.302153: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 351488 totalling 686.5KiB 2018-10-14 15:22:58.302163: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 140 Chunks of size 451584 totalling 60.29MiB 2018-10-14 15:22:58.302172: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 22 Chunks of size 677376 totalling 14.21MiB 2018-10-14 15:22:58.302181: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 702720 totalling 2.01MiB 2018-10-14 15:22:58.302190: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 4 Chunks of size 1354752 totalling 5.17MiB 2018-10-14 15:22:58.302199: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 143 Chunks of size 1806336 totalling 246.34MiB 2018-10-14 15:22:58.302205: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 22 Chunks of size 2709504 totalling 56.85MiB 2018-10-14 15:22:58.302210: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 3098624 totalling 2.96MiB 2018-10-14 15:22:58.302216: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 5419008 totalling 15.50MiB 2018-10-14 15:22:58.302221: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 6464256 totalling 6.16MiB 2018-10-14 15:22:58.302229: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 4 Chunks of size 7225344 totalling 27.56MiB 2018-10-14 15:22:58.302235: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 172 Chunks of size 7560192 totalling 1.21GiB 2018-10-14 15:22:58.302241: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 18 Chunks of size 10838016 totalling 186.05MiB 2018-10-14 15:22:58.302246: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 11340032 totalling 10.81MiB 2018-10-14 15:22:58.302252: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 18 Chunks of size 13934592 totalling 239.20MiB 2018-10-14 15:22:58.302257: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 15119872 totalling 14.42MiB 2018-10-14 15:22:58.302263: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 121 Chunks of size 15120128 totalling 1.70GiB 2018-10-14 15:22:58.302268: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 16333056 totalling 15.58MiB 2018-10-14 15:22:58.302274: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 17280000 totalling 16.48MiB 2018-10-14 15:22:58.302279: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 19439872 totalling 18.54MiB 2018-10-14 15:22:58.302285: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 22680064 totalling 21.63MiB 2018-10-14 15:22:58.302290: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 22680320 totalling 21.63MiB 2018-10-14 15:22:58.302296: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 23029760 totalling 21.96MiB 2018-10-14 15:22:58.302301: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 30038528 totalling 28.65MiB 2018-10-14 15:22:58.302307: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 8 Chunks of size 30240000 totalling 230.71MiB 2018-10-14 15:22:58.302312: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 4 Chunks of size 30240256 totalling 115.36MiB 2018-10-14 15:22:58.302318: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 34560000 totalling 65.92MiB 2018-10-14 15:22:58.302323: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 5 Chunks of size 45360128 totalling 216.29MiB 2018-10-14 15:22:58.302329: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 45360384 totalling 43.26MiB 2018-10-14 15:22:58.302334: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 49717248 totalling 142.24MiB 2018-10-14 15:22:58.302340: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 59983616 totalling 57.20MiB 2018-10-14 15:22:58.302345: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 60278784 totalling 172.46MiB 2018-10-14 15:22:58.302351: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 60480000 totalling 57.68MiB 2018-10-14 15:22:58.302356: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 60480512 totalling 57.68MiB 2018-10-14 15:22:58.302362: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 62841088 totalling 59.93MiB 2018-10-14 15:22:58.302367: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 67780608 totalling 64.64MiB 2018-10-14 15:22:58.302373: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 74319872 totalling 70.88MiB 2018-10-14 15:22:58.302378: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 7 Chunks of size 90720000 totalling 605.62MiB 2018-10-14 15:22:58.302384: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 137779712 totalling 131.40MiB 2018-10-14 15:22:58.302389: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 149151744 totalling 284.48MiB 2018-10-14 15:22:58.302395: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 298303488 totalling 284.48MiB 2018-10-14 15:22:58.302400: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 399398144 totalling 380.90MiB 2018-10-14 15:22:58.302405: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 6.93GiB
For future reference, you may want to look at the GitHub Markdown Guide to ensure that your post is easily readable.
Wrapping large code blocks in ```triple backticks``` will allow maintainers to spend less time trying to make sense of the post and more time trying to help with your issue:
# some code
some more code
I am having a similar issue with a GTX 1080Ti 11Gb, I am able to train only if I use a resolution of 600x600 px. Is it right? Is there something I could do to train with 1200x1200? I also have olnly 16Gb of CPU memory, could this be causing the error?
Totally agree with @tasercake . @CBCBCBCBCBCBCBCB , would you please update your code by following the guide? Also, for the error message, you don't need to copy and paste all of it, as it's too long to read for others who can help. Some important error messages definitely help better to debug the issue.
For 1200x1200 scale you do need more memory.
Closing this issue since it has been resolved. Feel free to reopen if have nay follow up questions. Thanks!
System information What is the top-level directory of the model you are using: /home/user/Work Have I written custom code: No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04 TensorFlow installed from (source or binary): source TensorFlow version (use command below): 1.10.1 Bazel version (if compiling from source): I don't use it CUDA/cuDNN version: CUDA 9.0 / cuDNN: 7.1 GPU model and memory: Geforce GTX 1070 Exact command to reproduce: No ############################################################ [Question]
I just try to train "faster_rcnn_nas_coco" and got error message.(I write that after configuration) I think, the reason of that error message is my GPU memory. I had Geforce GTX 1070 8GB. Do I need more gpu memory?
Thanks to read this!
Here is my "faster_rcnn_nas_coco.config" code
model { faster_rcnn { num_classes: 1 image_resizer {
TODO(shlens): Only fixed_shape_resizer is currently supported for NASNet
} }
train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0003 schedule { step: 900000 learning_rate: .00003 } schedule { step: 1200000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint: "faster_rcnn_nas_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true
Note: The below line limits the training process to 200K steps, which we
empirically found to be sufficient enough to train the pets dataset. This
effectively bypasses the learning rate schedule (the learning rate will
never decay). Remove the below line to train indefinitely.
num_steps: 200000 data_augmentation_options { random_horizontal_flip { } } }
train_input_reader: { tf_record_input_reader { input_path: "data/train.record" } label_map_path: "data/object-detection.pbtxt" }
eval_config: { num_examples: 8000
Note: The below line limits the evaluation process to 10 evaluations.
Remove the below line to evaluate indefinitely.
max_evals: 10}
eval_input_reader: { tf_record_input_reader { input_path: "data/test.record" } label_map_path: "data/object-detection.pbtxt" shuffle: false num_readers: 1 } ############################################################ [error log] And my code and error message, It was too long, so I cut front part of message, I write some part of that on comment
cb@cb-B150M-DS3H:~/Work/models/research/object_detection$ python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_nas_coco.config
2018-10-14 15:22:58.302414: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats: Limit: 7474652775 InUse: 7437627904 MaxInUse: 7437645056 NumAllocs: 10356 MaxAllocSize: 4076716032
2018-10-14 15:22:58.302586: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ***x**** 2018-10-14 15:22:58.302609: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at conv_ops.cc:398 : Resource exhausted: OOM when allocating tensor with shape[64,672,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'SecondStageFeatureExtractor/cell_12/AvgPool_1', defined at: File "train.py", line 184, in
tf.app.run()
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 272, in new_func
return func(*args, kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 290, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/home/cb/Work/models/research/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, *kwargs)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 203, in _create_losses
prediction_dict = detection_model.predict(images, true_image_shapes)
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 680, in predict
self._anchors.get(), image_shape, true_image_shapes))
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 776, in _predict_second_stage
scope=self.second_stage_feature_extractor_scope))
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 187, in extract_box_classifier_features
return self._extract_box_classifier_features(proposal_feature_maps, scope)
File "/home/cb/Work/models/research/object_detection/models/faster_rcnn_nas_feature_extractor.py", line 282, in _extract_box_classifier_features
start_cell_num=start_cell_num)
File "/home/cb/Work/models/research/object_detection/models/faster_rcnn_nas_feature_extractor.py", line 99, in _build_nasnet_base
cell_num=true_cell_num)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 311, in call
net = self._cell_base(net, prev_layer)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 289, in _cell_base
prev_layer = self._reduce_prev_layer(prev_layer, net)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 276, in _reduce_prev_layer
prev_layer, curr_num_filters, stride=2)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(args, current_args)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 117, in factorized_reduction
path2, [1, 1, 1, 1], stride_spec, 'VALID', data_format=data_format)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2124, in avg_pool
name=name)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 71, in avg_pool
data_format=data_format, name=name)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'SecondStageFeatureExtractor/cell_12/AvgPool_1', defined at: File "train.py", line 184, in
tf.app.run()
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 272, in new_func
return func(*args, kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 290, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/home/cb/Work/models/research/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, *kwargs)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 203, in _create_losses
prediction_dict = detection_model.predict(images, true_image_shapes)
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 680, in predict
self._anchors.get(), image_shape, true_image_shapes))
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 776, in _predict_second_stage
scope=self.second_stage_feature_extractor_scope))
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 187, in extract_box_classifier_features
return self._extract_box_classifier_features(proposal_feature_maps, scope)
File "/home/cb/Work/models/research/object_detection/models/faster_rcnn_nas_feature_extractor.py", line 282, in _extract_box_classifier_features
start_cell_num=start_cell_num)
File "/home/cb/Work/models/research/object_detection/models/faster_rcnn_nas_feature_extractor.py", line 99, in _build_nasnet_base
cell_num=true_cell_num)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 311, in call
net = self._cell_base(net, prev_layer)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 289, in _cell_base
prev_layer = self._reduce_prev_layer(prev_layer, net)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 276, in _reduce_prev_layer
prev_layer, curr_num_filters, stride=2)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(args, current_args)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 117, in factorized_reduction
path2, [1, 1, 1, 1], stride_spec, 'VALID', data_format=data_format)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2124, in avg_pool
name=name)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 71, in avg_pool
data_format=data_format, name=name)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Traceback (most recent call last): File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "train.py", line 184, in
tf.app.run()
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 272, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 415, in train
saver=saver)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 770, in train
sess, train_op, global_step, train_step_kwargs)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 487, in train_step
run_metadata=run_metadata)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'SecondStageFeatureExtractor/cell_12/AvgPool_1', defined at: File "train.py", line 184, in
tf.app.run()
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 272, in new_func
return func(*args, kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 290, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/home/cb/Work/models/research/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, *kwargs)
File "/home/cb/Work/models/research/object_detection/legacy/trainer.py", line 203, in _create_losses
prediction_dict = detection_model.predict(images, true_image_shapes)
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 680, in predict
self._anchors.get(), image_shape, true_image_shapes))
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 776, in _predict_second_stage
scope=self.second_stage_feature_extractor_scope))
File "/home/cb/Work/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 187, in extract_box_classifier_features
return self._extract_box_classifier_features(proposal_feature_maps, scope)
File "/home/cb/Work/models/research/object_detection/models/faster_rcnn_nas_feature_extractor.py", line 282, in _extract_box_classifier_features
start_cell_num=start_cell_num)
File "/home/cb/Work/models/research/object_detection/models/faster_rcnn_nas_feature_extractor.py", line 99, in _build_nasnet_base
cell_num=true_cell_num)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 311, in call
net = self._cell_base(net, prev_layer)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 289, in _cell_base
prev_layer = self._reduce_prev_layer(prev_layer, net)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 276, in _reduce_prev_layer
prev_layer, curr_num_filters, stride=2)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(args, current_args)
File "/home/cb/Work/models/research/slim/nets/nasnet/nasnet_utils.py", line 117, in factorized_reduction
path2, [1, 1, 1, 1], stride_spec, 'VALID', data_format=data_format)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2124, in avg_pool
name=name)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 71, in avg_pool
data_format=data_format, name=name)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/cb/anaconda3/envs/MaskRCNN/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,2016,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: SecondStageFeatureExtractor/cell_12/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.