Open MaarufB opened 3 years ago
I also get this issue, I know the reason is " The category_id will be set to -1 if the category annotations miss." https://github.com/pytorch/pytorch/issues/1204 the input for criterion should satisfy t >= 0 && t < n_classes. Maybe you can try to change the label -1 to a large number.
in the CUSTOM_DATASET.md, I got this err by using my own datasets but not change the params of num_class: 3
in the 'mmskl configs/recognition/st_gcn/dataset_example/train.yaml's train.yaml.
also you may change the test.yaml of the default param ' num_class: 3' to your real class numbers.
The problem is solved by change the indices of label from [1, N] to [0, N-1]. After debugging, I found error occured on the following 284. (./mmskeleton/mmskeleton/processor/recognition.py)
I checked the official documentation, and knonw that all indices in range [0, C].
Successful screenshot:
I run this command to train st-gcn model: mmskl configs/recognition/st_gcn/dataset_example/train.yaml
Load configuration information from configs/recognition/st_gcn/dataset_example/train.yaml INFO:mmcv.runner.runner:Start running, host: ai-pose@aipose-X570-GAMING-X, work_dir: /home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/work_dir/recognition/st_gcn/custom_dataset INFO:mmcv.runner.runner:workflow: [('train', 5), ('val', 1)], max: 65 epochs /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion
exec(compile(f.read(), file, 'exec'))
File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/tools/mmskl", line 131, in
main()
File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/tools/mmskl", line 121, in main
call_obj(cfg.processor_cfg)
File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/mmskeleton/utils/importer.py", line 24, in call_obj
return import_obj(type)(kwargs)
File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/mmskeleton/processor/recognition.py", line 120, in train
runner.run(data_loaders, workflow, total_epochs, loss=loss)
File "/home/ai-pose/anaconda3/envs/mm-test/lib/python3.7/site-packages/mmcv/runner/runner.py", line 359, in run
epoch_runner(data_loaders[i], kwargs)
File "/home/ai-pose/anaconda3/envs/mm-test/lib/python3.7/site-packages/mmcv/runner/runner.py", line 263, in train
self.model, data_batch, train_mode=True, kwargs)
File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/mmskeleton/processor/recognition.py", line 135, in batch_processor
log_vars = dict(loss=losses.item())
RuntimeError: CUDA error: device-side assert triggered
t >= 0 && t < n_classes
failed. Traceback (most recent call last): File "/home/ai-pose/anaconda3/envs/mm-test/bin/mmskl", line 7, in