Closed SMDIndomitable closed 2 weeks ago
@SMDIndomitable Can you provide your training command ? By the way, how is your dataset ? (like num samples, diversity, etc).
One reason can cause the NaN issue is precision type. Mine code is expected to work with NVIDIA Ampere architecture and above. If your GPU architecture are not in the range then please reply this again with the information, i will guide you to update the code.
Sorry, i couldn't provide the pretrained weights right now due to privacy related stuffs. For efficiently training, you could try to use pretrained backbones from timm
and adapt it to this code's architecture.
Hello, thank you for the quick reply.
I am just using this command: python train.py --data path/to/dataset_folder/
My dataset is: Eval: roughly 3.5k Training: roughly 9.5k
GPU: H100
I managed to get it training today with non nan issue. However the F1 score is quite low, is this suppose to be normal?
After training for 22 epochs,
[TRAIN] IoU: 0.0240, F1_Score: 0.0408 [TEST ] IoU: 0.0295, F1_Score: 0.0501
Also, sorry but what is timm?
@SMDIndomitable That seems like you're training from scratch with the default architecture from the paper. In the paper, they used a small dataset. If i remember correctly, the author expect the loss to be around 2.0-3.0 to be considered as close to converge (other scale's loss may vary a little bit).
timm is a repo where there are a lot of pretrained image models that is trained across many foundation datasets.
My training procedure and final model was using a backbone from timm. For this repo, I did train a working model from scratch without any modification but it take so long to converge.
If you insist to use this repo, please consider let it train for bit longer or use different scale/input size.
I see, do you still remember how many epochs did you trained it for, for it to converge?
If I were to use a pretrained image models, will it work on this repo?
Lastly, what would be a good input size, should I follow yolo's 640x640?
Thank you so much, really appreciate your insights on this!
Batch 011, Loss: 38.4052, Time: 1501.4329 ms, LR: 0.0003 Batch 021, Loss: 35.8887, Time: 1365.1630 ms, LR: 0.0003 Batch 031, Loss: 32.6055, Time: 1346.5857 ms, LR: 0.0003 Batch 041, Loss: 34.3496, Time: 1372.9401 ms, LR: 0.0003 Batch 051, Loss: 34.1344, Time: 1379.9658 ms, LR: 0.0003 Batch 061, Loss: 30.9654, Time: 1339.9464 ms, LR: 0.0003 Batch 071, Loss: 35.8543, Time: 1368.0040 ms, LR: 0.0003 Batch 081, Loss: 32.0710, Time: 1375.1762 ms, LR: 0.0003 Batch 091, Loss: 34.5599, Time: 1363.6372 ms, LR: 0.0003 Batch 101, Loss: 33.0427, Time: 1351.2963 ms, LR: 0.0003 Batch 111, Loss: 31.3551, Time: 1442.6491 ms, LR: 0.0003 Batch 121, Loss: 33.5702, Time: 1375.8085 ms, LR: 0.0003 Batch 131, Loss: 32.3437, Time: 1341.2880 ms, LR: 0.0003 Batch 141, Loss: nan, Time: 1373.5526 ms, LR: 0.0003 Batch 151, Loss: nan, Time: 1360.2784 ms, LR: 0.0003 Batch 161, Loss: nan, Time: 1363.4261 ms, LR: 0.0003 Batch 171, Loss: nan, Time: 1344.6392 ms, LR: 0.0003 Batch 181, Loss: nan, Time: 1374.4329 ms, LR: 0.0003 Batch 191, Loss: nan, Time: 1435.9650 ms, LR: 0.0003 Batch 201, Loss: nan, Time: 1364.9449 ms, LR: 0.0003 Batch 211, Loss: nan, Time: 1391.6659 ms, LR: 0.0003 Batch 216, Loss: nan, Time: 680.1543 ms, LR: 0.0003
Unfortunately, It when to nan for these parameters: python train.py --data /workspace/volume//License --lr 0.001 --bs 32 --size 640 --scale base
@SMDIndomitable You can choose any size. But i recommend to lower it down since this is designed to be lightweight. I often used 256 or 384, that should be enough.
The input image should be the bounding box around the vehicle (not full image with lots of background and mix of multiple vehicles, object, etc.). Ideally this should be the output from object detection model.
On your error, i suspect it related to these lines: https://github.com/huanidz/scaled-alpr-unconstrained/blob/cb99d5b10fe6f3b969e4a3c52d944ee72cb45d5b/train.py#L104-L110
What you can try:
Alright, thank you once again :D. I decided to finish the 200 epochs on these settings lr=0.001, bs=16, epochs=200, eval_after=1, size=384, scale='base', before I experiment around. I will let you know how it turned out
Edit: I read somewhere that training from scratch requires 20k epochs, is that true?
@SMDIndomitable No, you don't need 20k epoch. That is ridiculous amount. The number of epoch is depend on the dataset size and the model itself but not that huge :smiley:
It hard to estimate for you case but i suggest keeping from 200 to 300, it should be a good starting value. If you have a good dataset, maybe 50-100 is enough.
I see, thanks for the advice. Anyways here is the result after running 261 epochs on the small model using this lr=0.0003, bs=32, epochs=2000, eval_after=1, size=256, scale='small', resume_from=None)
[TEST ] IoU: 0.0352, F1_Score: 0.0593 Higher f1 score found. Saving model... Epoch 261/2000
I am going to continue training to see. Maybe the dataset I am using is not suitable, do you think lowering the amount of images will help? My images are mostly cropped vehicles with the license plate polygon of 4 points
@SMDIndomitable If you can, please let me see some of your training samples (image and polygon coordinates).
These are some of the training images, green boxes indicating the polygon
These are some of the evaluation images, green boxes indicating the polygon, had to erase the license for privacy reasons
I used these images to train for yolov8 segmentation, I do realise there are alot of environment as you mentioned. Should I recreate a dataset that only contains cropped images of the vehicle?
@SMDIndomitable Yes, you should. And please verify the coordinate is in this format (xxxxyyyy):
# The order of 1-->4 is (x1 - y1: top left, x2 - y2: top right, x3 - y3: bottom right, x4 - y4: bottom left)
# x1, x2, x3, x4, y1, y2, y3, y4
0.497917, 0.677083, 0.670833, 0.489583, 0.734737, 0.747368, 0.844211, 0.831579
yep, the coordinate should be correct, I drew the bounding box with the x1,x2,x3,x4,y1,y2,y3,y4 format so I think the coordinates should be correct. The numbers is in normalized form just like yolo right?
@SMDIndomitable Different YOLOs may uses different formats like (x-center,y-center,w,h) so it depends. Basically you just need to ensure it in this repo format. Otherwise you can just write a basic script to convert.
By the way, feel free to add me at some other chat platforms like discord, skype, etc (huanidz - huannguyena2@gmail.com) if you still have questions.
Oh, what I mean is that the values are in decimals. So I assume the values are x/ image.width and y/ image.height to obtain the decimal values. Sure, I will add you on discord if you're fine with that.
Hi, I tried training a model with the training script, but I either arrive at loss: nan or very bad F1-score. Do you know what could be reason? Also, is there any pretrained weights I can use to test?