thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Segmentation fault (core dumped) #1107

Closed LuvRC closed 4 years ago

LuvRC commented 4 years ago

I used class=1(solar panel ) and changes filters =30 for custom object detection training. i am using VMware15.05 and i installed RHEL 6 on VMware.
I am given size 400 GB for RHEL 6 on VMware I am using yolo.cfg configuration file and yolo.weights weights file for training. i use class=1 and filter =30 i changes in label.txt i.e. contain only "solar_panel" .

After Executing this Command :

    python flow  --model cfg/yolo-1c.cfg --load bin/yolo.weights --train --annotation new_model_data/annotations --dataset new_model_data/images  --epoch 400

Process are failed and this is Last line after executing above Command .

  0219-12-19 03:45:00.680721: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum 
  Total of in-use chunks: 2.80GiB
  2019-12-19 03:45:00.680729: I tensorflow/core/common_runtime/bfc_allocator.cc:818]  
  total_region_allocated_bytes_: 3011067904 memory_limit_: 3011067904 available bytes: 0 
  curr_region_allocation_bytes_: 4294967296
  2019-12-19 03:45:00.680748: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: 
Limit:                  3011067904
InUse:                  3011067904
MaxInUse:               3011067904
NumAllocs:                    1574
MaxAllocSize:            830704384

2019-12-19 03:45:00.680786: W tensorflow/core/common_runtime/bfc_allocator.cc:319] 
**************************x**********************************x**************************************
 2019-12-19 03:45:00.689879: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES 
failed at mkl_util.h:1026 : Resource exhausted: OOM when allocating tensor with 
shape[189267968] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator 
 mklcpu
 Segmentation fault (core dumped)

Problem Images ...........................

image

image

image

image

image

image

Please help me to solve this Problem .................................Thank you

LuvRC commented 4 years ago

@thtrieu, please help. Thanks a lot in advance.

LuvRC commented 4 years ago

@ankitAMD please help.

ankitAMD commented 4 years ago

Everything is fine you done well ........use any another configuration file (like tiny-yolo.cfg) which have less layers ........yolo.cfg contain more than 22 convolution layers so it wants more memory .....that's why your this error comes .

In future if you provide sufficient amount of memory for yolo.cfg configuration file for training but your speed is slow if you uses CPU ..........because CPU have very less processing power.

ankitAMD commented 4 years ago

A

Everything is fine you done well ........use any another configuration file (like tiny-yolo.cfg) which have less layers ........yolo.cfg contain more than 22 convolution layers so it wants more memory .....that's why your this error comes .

In future if you provide sufficient amount of memory for yolo.cfg configuration file for training but your speed is slow if you uses CPU ..........because CPU have very less processing power.

Always use yolo-1c.cfg a new copy of yolo.cfg configuration file. Don't edit the yolo.cfg.

LuvRC commented 4 years ago

thank you Ankit.

Everything is fine you done well ........use any another configuration file (like tiny-yolo.cfg) which have less layers ........yolo.cfg contain more than 22 convolution layers so it wants more memory .....that's why your this error comes .

In future if you provide sufficient amount of memory for yolo.cfg configuration file for training but your speed is slow if you uses CPU ..........because CPU have very less processing power.

      Thank you @ankitAMD this helps me to solve this error. Thank you again