ZeroDivisionError: integer division or modulo by zero

luizgh / sigver

Signature verification package, for learning representations from signature data, training user-dependent classifiers.

BSD 3-Clause "New" or "Revised" License

82 stars 46 forks source link

ZeroDivisionError: integer division or modulo by zero #21

Open MuadDev opened 3 years ago

MuadDev commented 3 years ago

Dear Luis,

Thank you for making this code publicly available. I do encounter an error while trying to train on the CEDAR dataset though, perhaps you could help me?

The error looks like this:

The script I use to start training looks like this:

python3.7 -m  sigver.featurelearning.train \
    --dataset cedar \
    --model signet \
    --dataset-path  /data/signature_matching/data/processed/sigver_datasets/cedar.npz \
    --users 10 20 \
    --epochs 2 \
    --logdir /tmp/signet

The cedar.npz dataset has been preprocessed as has been indicated in the README.

Hope you can help me with this.

susreetha5 commented 2 years ago

Dear Luis,

Thank you for making this code publicly available. I do encounter an error while trying to train on the CEDAR dataset though, perhaps you could help me?

The error looks like this:

The script I use to start training looks like this:
python3.7 -m  sigver.featurelearning.train \
    --dataset cedar \
    --model signet \
    --dataset-path  /data/signature_matching/data/processed/sigver_datasets/cedar.npz \
    --users 10 20 \
    --epochs 2 \
    --logdir /tmp/signet
The cedar.npz dataset has been preprocessed as has been indicated in the README.

Hope you can help me with this.

luizgh commented 2 years ago

The problem seems to be the combination of the parameters "epochs" and "lr_decay_times" (this one defaults to 3).

For this project, I used a step decay for the learning rate (like the original resnet papers), with a default of 3 decays. So if you train with, say 60 epochs, it would decay at 20, 40 and 60. The problem here is that you are asking to train for 2 epochs, and using the default of 3 decays, which won't work: this line: In line https://github.com/luizgh/sigver/blob/master/sigver/featurelearning/train.py#L71, will tell the scheduler to change the learning rate each 0 epochs, which won't work.

So your options are: 1) increasing the number of epochs; 2) Decreasing "lr_decay_times", 3) Changing the learning rate scheduler (nowadays I use cosine annealing for my projects: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingLR.html)