Closed Ashutosh1995 closed 3 years ago
I used my own custom dataset
I used my own custom dataset
I figure it it out by check out category items.
@nTnZone were you able to solve it ?
@zylo117 in the collater function, you append the annotation matrix with an extended version having -1's.
This means the labels are also getting -1's right ? Can that lead to this issue of cuda device triggering ?
Please suggest!
category id starts from 1
@zylo117 in my voc to coco conversion file I passed my objects starting from index 1 as below:
PRE_DEFINE_CATEGORIES = {"auto": 1, "bicycle": 2, "bus": 3, "biker": 4, "car":5, "cow": 6, "cyclist": 7, "dog": 8, "motorbike": 9, "minitruck": 10, "person": 11, "truck": 12, "van": 13, "tractor": 14, "trolley": 15}
Also in the project.yml file, I defined the obj_list as ["auto","bicycle","bus","biker","car","cow","cyclist","dog","motorbike", "minitruck","person","truck","van","tractor","trolley"] i.e same order as above
Is there something else which I should do since the training code always stops during validation when images are tansfered on cuda printing the same error: CUDA error: device-side assert triggered
It's hard to tell without any details
@zylo117 Could you please tell me what details you want so that the issue can be fixed?
what's the error? logs?
In the training loop, I get the following error:
and when the val loop begins, the code breaks giving the following error
So you can still manage to train for a few steps? Could it be OOM? You should monitor vram in nvidia-smi.
In the same epoch, when the validation stage enters, it triggers the warning and then quits.
Is the value showing in nvidia-smi is what vram is ?
If no, could you please give a pointer on how to calculate vram ?
validation?But there's a few thousand steps remaining.
Did you modify the code? Can you run the tutorials?
Actually, training gets completed triggering the error shot as a warning I pasted earlier.
It's when the validation phase starts, the code breaks and outputs RUNTime error: CUDA device asserted
I did not modify the code
I will run the tutorials also. I ran the test code and it ran perfectly.
It got resolved. Thanks!
@zylo117 I am training Efficient D0 with my custom dataset after resolving the issue earlier I had.
I am running the following command
python train.py -c 0 -p indian_road --batch_size 4 --lr 1e-3 --num_epochs 15 --load_weights weights/efficientdet-d0.pth
During the final stage of training, I am getting the following error
Kindly help!