Open idekazuki opened 4 years ago
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "faster_rcnn.py", line 50, in <module>
train_one_epoch(model, optimizer, train_data_loader, device, epoch, print_freq=10)
File "/host/space0/ide-k/mygit/Detectron4epic/engine.py", line 32, in train_one_epoch
loss_dict = model(images, targets)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 71, in forward
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 765, in forward
class_logits, box_regression, labels, regression_targets)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 42, in fastrcnn_loss
sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1)
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered
this error is solved by chenge class num. 315 to 352.
Traceback (most recent call last):
File "faster_rcnn.py", line 50, in <module>
train_one_epoch(model, optimizer, train_data_loader, device, epoch, print_freq=10)
File "/host/space0/ide-k/mygit/Detectron4epic/engine.py", line 26, in train_one_epoch
for images, targets in metric_logger.log_every(data_loader, print_freq, header):
File "/host/space0/ide-k/mygit/Detectron4epic/utils.py", line 209, in log_every
for obj in iterable:
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yanai-lab/ide-k/ide-k/pyenv/detec2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/host/space0/ide-k/mygit/Detectron4epic/epicdatasets.py", line 104, in __getitem__
image = Image.fromarray(frames[frame_id])
IndexError: list index out of range
frames[frame_id]) -> frames[frame_id - 1])
Error loss is nan
lr = 0.005 -> lr = 0.0005
Error loss is nan . check dataset xmax, ymax > xmiin, ymin xmax, ymax and xmin, ymin in image range.
値のおかしいデータ合計4個を発見したので削除。
thiis error is maybe solved by add blow code