ValueError during training on custom dataset

lee-t commented 7 years ago

I just need a little help on something that keeps happening to my training runs. I'm trying to train tiny-yolo on my on data set and I cant complete a full training run.

The command used: python3 flow --model cfg/tiny-yolo-voc-new.cfg --load bin/tiny-yolo-voc.weights --train --dataset annotations_pascal/JPEGImages --annotation annotations_pascal/Annotations --gpu 0.7

And the error i get during the run

Finish 240 epoch(es)
step 2401 - loss 1.0915464162826538 - moving ave loss 0.7566407958832121
step 2402 - loss 0.25175240635871887 - moving ave loss 0.7061519569307627
Traceback (most recent call last):
  File "flow", line 6, in <module>
    cliHandler(sys.argv)
  File "/home/Scratch/darkflow-clone/darkflow/cli.py", line 29, in cliHandler
    print('Enter training ...'); tfnet.train()
  File "/home/Scratch/darkflow-clone/darkflow/net/flow.py", line 39, in train
    for i, (x_batch, datum) in enumerate(batches):
  File "/home/Scratch/darkflow-clone/darkflow/net/yolo/data.py", line 126, in shuffle
    x_batch = np.concatenate(x_batch, 0)
ValueError: need at least one array to concatenate

As far as I can tell, x_batch is empty and shuffle throws out this error. I don't understand enough about the code to know why this would occur during training.

dimaxano commented 7 years ago

Hi! Tell me, please, what is the value of subdivision in you .cfg file?

lee-t commented 7 years ago

From the file tiny-yolo-voc-new.cfg:

[net]
batch=64
subdivisions=8
width=416
height=416
channels=3

dimaxano commented 7 years ago

If you still have this problem, try to increase subdivision by the power of 2 (2,,4,8)

tankienleong commented 7 years ago

Hi @dimaxano may I know which part of the darkflow code use the subdivisions?

dimaxano commented 7 years ago

Hi! Unfortunately, I didn't found any mentions of "subdivision" in the source code. But increasing this parameter allows me to avoid ValueError: need at least one array to concatenate

tankienleong commented 7 years ago

Hi @lee-t , I think the error is cause by the data augmentation in preprocess function at predict.py file. You can try to disable the scale, translation, flipping and recolor function.

danhdevelop commented 6 years ago

i have this problem also and i found out that it came from not clean data. I added this code if len(x_batch) == 0: continue above x_batch = np.concatenate(x_batch, 0) darkflow/net/yolo/data.py to ignore it. Hope this can help.

aseembh2001 commented 5 years ago

My guess is it is something to do with the memory. I reduced the batch size im my code and it worked fine. Not very sure though.

thtrieu / darkflow

ValueError during training on custom dataset #373