訓練出現錯誤 - Githubissues

leehao178 commented 4 years ago

HI @yjh0410 你好 ubuntu 16.04 cuda 10.0 python 3.6 pytorch 1.1.0 torchvision 0.3.0

訓練自己的訓練集時使用voc0712.py, 並且有修改行66 int改為float才能過要不然會跳ValueError: invalid literal for int() with base 10: '190.0', 改完後開始訓練一陣子就會跳以下錯誤

/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [384,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [385,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [386,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [387,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [388,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [389,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [390,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [391,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [392,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [393,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [394,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [395,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [396,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [397,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [398,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [399,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [400,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [401,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [402,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [403,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [404,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [405,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [406,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [407,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [408,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [409,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [410,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [411,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [412,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [413,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [414,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [415,0,0] Assertion *input >= 0. && *input <= 1. failed. Epoch[3 / 250] || iter 210 || Loss: 23.8245 || || lr: 0.00005026 || || input size: 320 || Traceback (most recent call last): File "train_voc.py", line 206, in train(fcos_lite, device) File "train_voc.py", line 151, in train cls_loss, ctn_loss, box_loss = tools.loss(out, targets, num_classes=args.num_classes) File "/home/danny/Lab/FCOS/FCOS-LITE/tools.py", line 120, in loss box_loss = torch.mean(torch.sum(box_loss_func(iou, gt_iou) gt_pos, dim=-1) / N_pos) RuntimeError: CUDA error: device-side assert triggered /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [192,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [193,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [194,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [195,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [196,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [197,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [198,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [199,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [200,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [201,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [202,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [203,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [204,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [205,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [206,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [207,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [208,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [209,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [210,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [211,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [212,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [213,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [214,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [215,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [216,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [217,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [218,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [219,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [220,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [221,0,0] Assertion `input >= 0. && input <= 1.` failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [222,0,0] Assertion *input >= 0. && *input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [223,0,0] Assertion `input >= 0. && *input <= 1.` failed.

yjh0410 commented 4 years ago

你好！ voc2007.py这个文件我没有遇到你说的那个问题。另外，从报错的问题上来看，应该是因为在计算Loss的时候，BCE函数出现了问题，具体问题我也现在不能判断，由于疫情，我现在待在家里，没法训练这个FCOS-LITE，所以很多训练过程中可能出现的bug本人也尚不清楚。所以这个项目还是个半成品，因此给你带来了些不必要的麻烦，实在抱歉！我现在也是很苦恼，但校方因为疫情的现状，暂不能通知开血，本人也无法返校，还望谅解！待返校之后，我一定会调试好这个项目！

------------------ 原始邮件 ------------------ 发件人: "leehao178"<notifications@github.com>; 发送时间: 2020年3月4日(星期三) 中午12:57 收件人: "yjh0410/FCOS-LITE"<FCOS-LITE@noreply.github.com>; 抄送: "ら．Secret"<1394571815@qq.com>;"Mention"<mention@noreply.github.com>; 主题: [yjh0410/FCOS-LITE] 訓練出現錯誤 (#1)

HI @yjh0410 你好 ubuntu 16.04 cuda 10.0 python 3.6 pytorch 1.1.0 torchvision 0.3.0

訓練自己的訓練集時使用voc0712.py, 並且有修改行66 int改為float才能過要不然會跳ValueError: invalid literal for int() with base 10: '190.0', 改完後開始訓練一陣子就會跳以下錯誤

/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [384,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [385,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [386,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [387,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [388,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [389,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [390,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [391,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [392,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [393,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [394,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [395,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [396,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [397,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [398,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [399,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [400,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [401,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [402,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [403,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [404,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [405,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [406,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [407,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [408,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [409,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [410,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [411,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [412,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [413,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [414,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [415,0,0] Assertion input >= 0. && input <= 1. failed. Epoch[3 / 250] || iter 210 || Loss: 23.8245 || || lr: 0.00005026 || || input size: 320 || Traceback (most recent call last): File "train_voc.py", line 206, in train(fcos_lite, device) File "train_voc.py", line 151, in train cls_loss, ctn_loss, box_loss = tools.loss(out, targets, num_classes=args.num_classes) File "/home/danny/Lab/FCOS/FCOS-LITE/tools.py", line 120, in loss box_loss = torch.mean(torch.sum(box_loss_func(iou, gt_iou) gt_pos, dim=-1) / N_pos) RuntimeError: CUDA error: device-side assert triggered /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [192,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [193,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [194,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [195,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [196,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [197,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [198,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [199,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [200,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [201,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [202,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [203,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [204,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [205,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [206,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [207,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [208,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [209,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [210,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [211,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [212,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [213,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [214,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [215,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [216,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [217,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [218,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [219,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [220,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [221,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [222,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [10,0,0], thread: [223,0,0] Assertion input >= 0. && *input <= 1. failed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

leehao178 commented 4 years ago

知道了～感謝你期待你這個不錯的項目！也希望疫情可以得到緩解！

yjh0410 commented 4 years ago

非常感谢！一旦有好的结果，我会第一时间更新到github上~

------------------ 原始邮件 ------------------ 发件人: "leehao178"<notifications@github.com>; 发送时间: 2020年3月4日(星期三) 中午1:24 收件人: "yjh0410/FCOS-LITE"<FCOS-LITE@noreply.github.com>; 抄送: "ら．Secret"<1394571815@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [yjh0410/FCOS-LITE] 訓練出現錯誤 (#1)

知道了～感謝你期待你這個不錯的項目！也希望疫情可以得到緩解！

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

leehao178 commented 4 years ago

作者您好我有用新版的測試,也會有錯誤～包含voc2007voc2012與我自己的訓練集都會

因為我不太熟練使用,不過以下是我觀察的問題, https://github.com/yjh0410/FCOS-LITE/blob/6a474d9ac3385e5c8de47734f853349e79e34b0d/train_voc.py#L147 這邊的out出來會是全部都「nan」,造成進入tools.loss算出來的loss會是nan,不知是不是這樣所以才出現錯誤的！最後感謝作者：）

yjh0410 commented 4 years ago

你好！最新的代码，我在我的电脑上并没有出现out=nan的问题。对于nan的问题，如果是我的话，我会从以下几点去检查： 1.把每一次的out都print出来，看看是从哪一步开始出现的NaN； 2.同时，我也会将输入的image数据也print一下，确定输入是否包含NaN。以上两点是我常用的办法；如果是1引起的问题，我怀疑可能是网络训练过程中发散了~这种问题我经常会遇到。如果是2引起的问题，emmmm......数据预处理的代码我用了很久，包括我自己准复现的yolo-v2、v3也都是用的那段预处理代码，并未在输入这一块出现过nan。

最后，十分感谢你提出的问题！我也想尽力去维护这个项目，但很多潜在的问题只有完整的train一下才能发现，我也实在是无能为力，尽管我也很着急。这个项目还处于早期开发阶段，因此也会有很多问题，给你带来了不必要的麻烦，还望见谅！

leehao178 commented 4 years ago

好喔知道了我可以去試試看謝謝!

yjh0410 commented 4 years ago

你好！我刚刚在tools.py文件中发现了一处bug，就是我的BCE_focal_loss这个函数写错了，已经修改了，方便的时候你可以再试试，还会不会出现NaN的问题~

yjh0410 commented 4 years ago

你好，，，我似乎解决了我模型的bug。在tools.py中，计算loss时有一个N_pos变量，这个变量后面会作为除法中的除数，但它可能包含0，这个可能性导致了诸如NaN的问题，我fix了这个bug，现在应该好使了

leehao178 commented 4 years ago

作者您好

經過你的修復,已經可以正常訓練了,不過我有發現每當我要存 pth時,一開始存檔可以,可是隔一陣子電腦就會死機,整個畫面都沒反應只能重開機,所以我想不確定是不是存檔造成顯卡閃存剛好爆掉之類的,因為死機時顯卡閃存剛好都接近全滿,btw我使用的是2080TI, 後來我把batch從8調到4就可以正常訓練完畢了, 最後非常感謝作者勤勞的更新, 真的非常感謝, 也期待作者後續的作品!

yjh0410 commented 4 years ago

你好！我用的也是2080TI，训练的时候batch设为32，能够完整地训练完，没有出现死机问题。目前我的测试结果挺差的，在VOC上连40的mAP都没有，而且FPS也不是很高，没有我的yolo-v2快。接下来我可能要做一些调参或者魔改了，测试代码我push到github上了。

yjh0410 / FCOS-RT_PyTorch

訓練出現錯誤 #1