Open bbangbin2780 opened 3 years ago
@bbangbin2780 it seems that the num of char_context, x and globa_context is not equal. This implementation only train with batchsize 4 with 4gpus. our 64 classes are text, 0-9, a-z, A-Z and background.
I appreciate your answer, Thanks
I modify my config file ( batch size 4)
then below error occurred
Traceback (most recent call last):
File "tools/train_net.py", line 161, in <module>
args=(args,),
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(*args)
File "/home/ensa/JYB/TextFuseNet/tools/train_net.py", line 149, in main
return trainer.train()
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/defaults.py", line 356, in train
super().train(self.start_iter, self.max_iter)
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/meta_arch/rcnn.py", line 88, in forward
_, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 581, in forward
losses = self._forward_box(features_list, proposals)
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 650, in _forward_box
return outputs.losses()
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/fast_rcnn.py", line 267, in losses
"loss_box_reg": self.smooth_l1_loss(),
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/fast_rcnn.py", line 209, in smooth_l1_loss
self.proposals.tensor, self.gt_boxes.tensor
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/box_regression.py", line 66, in get_deltas
assert (src_widths > 0).all().item(), "Input boxes to Box2BoxTransform are not valid!"
RuntimeError: CUDA error: device-side assert triggered
If the number of classes is greater than 64, is error occurred?
I want to learning at least 1000 characters
Thanks
@bbangbin2780 the IMS_PER_BATCH should be set to 4 when using 4 gpus. if you set more classes, the pred_branches in our model will be skiped when training your custom datasets.
I have for GPUs (4 TITAN RTX)
if class number over 64, then that error occured
@bbangbin2780 if you change the num of classes, there are several configs should be modified in detectron2/data/datasets/builtin.py as well
@bbangbin2780 if you change the num of classes, there are several configs should be modified in detectron2/data/datasets/builtin.py as well
Why is pred_branches in your model skipped if I set the number of classes more than 63?
I read your paper again but I don't understand why is pred_branches skipped.
@Real-YeJ. I have a same as problem and I have updated for file detectron2/data/datasets/builtin.py and this is my config
'BASE: "./Base-RCNN-FPN.yaml" MODEL: MASK_ON: True TEXTFUSENET_MUTIL_PATH_FUSE_ON: True WEIGHTS: "" PIXEL_STD: [57.375, 57.120, 58.395] RESNETS: STRIDE_IN_1X1: False # this is a C2 model NUM_GROUPS: 32 WIDTH_PER_GROUP: 8 DEPTH: 50 ROI_HEADS: NMS_THRESH_TEST: 0.3 TEXTFUSENET_SEG_HEAD: FPN_FEATURES_FUSED_LEVEL: 2 POOLER_SCALES: (0.0625,) DATASETS: TRAIN: ("mydataset",) TEST: ("mydataset",) SOLVER: IMS_PER_BATCH: 1 BASE_LR: 0.001 STEPS: (40000,80000,) MAX_ITER: 120000 CHECKPOINT_PERIOD: 2500 INPUT: MIN_SIZE_TRAIN: (800,1000,1200) MAX_SIZE_TRAIN: 1500 MIN_SIZE_TEST: 800 MAX_SIZE_TEST: 1500
OUTPUT_DIR: "./out_dir_r101/icdar2013_model/" ' and my command line is:
python train_net.py --num-gpus 1 --config-file configs/ocr/icdar2013_101_FPN.yaml
and in file detectron2/data/datasets/builtin.py . I add one more key in dict PREDEFINED_SPLITS_COCO["coco"] is:
"mydataset":("F:/project_2/New_folder/data/downloads", "F:/project_2/New_folder/data/downloads/train.json")
But it still have issue below:
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/box_regression.py", line 66, in get_deltas assert (src_widths > 0).all().item(), "Input boxes to Box2BoxTransform are not valid!" RuntimeError: CUDA error: device-side assert triggered
I have a question while learning Korean dataset
Follow the steps below to proceed
below is config file ( just change the dataset name from total-text config file )
register with coco_register in detectron2/data/datasets/builtin.py.
An error occurs when learning
To test whether learning is possible,I just tested with 3 images. then this error is occurred
I compared the your sample coco format to my coco format, but it was the same.
I need to learn at least 1000 characters, does this error relevant to the number of characters? or relevant to input size?
Thank you for reading please help...