luizgh / sigver

Signature verification package, for learning representations from signature data, training user-dependent classifiers.
BSD 3-Clause "New" or "Revised" License
82 stars 46 forks source link

RuntimeError: invalid argument 2: non-empty vector or matrix expected at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:32 #17

Closed devendraswamy closed 4 years ago

devendraswamy commented 4 years ago

CUDA_VISIBLE_DEVICES=0 python -m sigver.featurelearning.train --model signet --dataset-path /home/dell/Documents/Prasad_AI/sign_similarity/sigver/dataset_npz/dataset_npz.npz --users 20 55 --model signet --epochs 60 --forg --lamb 0.95 --logdir signet_f_lamb0.95 Namespace(batch_size=32, checkpoint=None, dataset_path='/home/dell/Documents/Prasad_AI/sign_similarity/sigver/dataset_npz/dataset_npz.npz', epochs=60, forg=True, gpu_idx=0, input_size=(150, 220), lamb=0.95, logdir='signet_f_lamb0.95', loss_type='L2', lr=0.001, lr_decay=0.1, lr_decay_times=3, model='signet', momentum=0.9, seed=42, test=False, users=[20, 55], visdomport=None, weight_decay=0.0001) Using device: cuda:0 Loading Data Initializing Model Training /home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:180: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). x = torch.tensor(x, dtype=torch.float).to(device) /home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:181: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). y = torch.tensor(y, dtype=torch.long).to(device) /home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:182: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). yforg = torch.tensor(batch[2], dtype=torch.float).to(device) /home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:278: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). x = torch.tensor(x, dtype=torch.float).to(device) /home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:279: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). y = torch.tensor(y, dtype=torch.long).to(device) /home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py:280: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). yforg = torch.tensor(yforg, dtype=torch.float).to(device) Epoch 0. Val loss: 3.4440, Val acc: 10.22%,Val forg loss: 0.5179, Val forg acc: 79.69% Epoch 1. Val loss: 3.2476, Val acc: 20.76%,Val forg loss: 0.4181, Val forg acc: 84.38% Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 428, in main(arguments) File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 369, in main device, logger, args, logdir) File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 81, in train epoch, optimizer, lr_scheduler, callback, device, args) File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/sigver/featurelearning/train.py", line 201, in train_epoch class_loss = F.cross_entropy(logits, y[yforg == 0]) File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/myenv/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/dell/Documents/Prasad_AI/sign_similarity/sigver/myenv/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: invalid argument 2: non-empty vector or matrix expected at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:32

luizgh commented 4 years ago

Thanks for submitting the issue. Looks like this is a bug on the training code, that happens when a batch has only forgeries (so that there are no vectors with features[yforg == 0]).

luizgh commented 4 years ago

I updated the code to handle this scenario, can you please git pull and try again?