Open huyhuyvu01 opened 1 year ago
Can you paste the output of the first few lines here?
Can you paste the output of the first few lines here?
Here is the first few lines of the cmd
(DeepLearning) D:\Knowledge\MachineLearning\FasterRCNN>python train.py --data data_configs/traffic.yaml --epochs 50 --model fasterrcnn_resnet50_fpn --name trafficSign_detection_no_bg --batch 12
Not using distributed mode
wandb: Currently logged in as: huyhuyvu01. Use wandb login --relogin
to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in D:\Knowledge\MachineLearning\FasterRCNN\wandb\run-20230429_185442-lb7qygux
wandb: Run wandb offline
to turn off syncing.
wandb: Syncing run trafficSign_detection_no_bg
wandb: View project at https://wandb.ai/huyhuyvu01/uncategorized
wandb: View run at https://wandb.ai/huyhuyvu01/uncategorized/runs/lb7qygux
device cuda
Creating data loaders
Number of training samples: 722
Number of validation samples: 84
None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=FasterRCNN_ResNet50_FPN_Weights.COCO_V1
. You can also use weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
C:\Users\huyhu\anaconda3\envs\DeepLearning\lib\site-packages\torchinfo\torchinfo.py:477: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
action_fn=lambda data: sys.getsizeof(data.storage()),
C:\Users\huyhu\anaconda3\envs\DeepLearning\lib\site-packages\torch\storage.py:665: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return super().sizeof() + self.nbytes()41,432,411 total parameters. 41,210,011 training parameters. Epoch: [0] [ 0/61] eta: 0:14:41 lr: 0.000018 loss: 4.2775 (4.2775) loss_classifier: 3.5476 (3.5476) loss_box_reg: 0.0632 (0.0632) loss_objectness: 0.3510 (0.3510) loss_rpn_box_reg: 0.3157 (0.3157) time: 14.4483 data: 12.7076 max mem: 8579
That's odd. I never faced that issue. Just to be sure, can you please recheck your image directory once more?
Here is the corresponding directory, the location is based from the train.py file which is in the FasterRCNN folder
The file name and file structure of the label is also correct.
There are also another folder called test in the data directory too, but I don't thinks it the cause of the issue.
https://universe.roboflow.com/ictu/vietnam-traffic-signs-detection2/dataset/1 Here is the link to my dataset incase you need it
Thanks. Will check it out.
As the title stated, when training with custom dataset, the train and validation samples got double in the command promt, which I think affect the training speed and the accuracy of the training process. My dataset consists of 361 train samples and 42 validation samples.
Here is my train promt:
python train.py --data data_configs/traffic.yaml --epochs 50 --model fasterrcnn_resnet50_fpn --name trafficSign_detection_no_bg --batch 12
Here my dataconfig files: TRAIN_DIR_IMAGES: data/traffic-sign/train/images TRAIN_DIR_LABELS: data/traffic-sign/train/annotations VALID_DIR_IMAGES: data/traffic-sign/valid/images VALID_DIR_LABELS: data/traffic-sign/valid/annotations
CLASSES: [ 'cam_di_nguoc_chieu', 'cam_oto', 'cam_oto_re_phai', 'cam_mo_to', 'cam_oto_va_moto', 'cam_ng_di_bo', 'cam_re_trai', 'cam_re_phai', 'cam_quay_dau_trai', 'max_spd_40', 'max_spd_50', 'max_spd_60', 'max_spd_80', 'cam_dung_do', 'cam_do', 'duong_giao_nhau', 'giao_nhau_vs_ko_uu_tien', 'giao_nhau_vs_ko_uu_tien_trai', 'giao_nhau_vs_uu_tien', 'dg_co_ng_di_bo_cat_ngang', 'tre_em_qua_duong', 'cong_truong', 'day_cap', 'slow', 'huong_phai_di', 'danh_cho_ng_di_bo', 'dg_mot_chieu', 'dg_cho_oto',
'background',
] NC: 28 SAVE_VALID_PREDICTION_IMAGES: True`
the training process take a very long time of 2hour plus on RTX 3060, and the precision, the mAP is very low, around 0.23 for 50 epochs with batch_size of 12.