locuslab / SATNet

Bridging deep learning and logical reasoning using a differentiable satisfiability solver.
MIT License
404 stars 52 forks source link

Training on CPU: "invalid on input" #4

Open fdietze opened 5 years ago

fdietze commented 5 years ago

Hi, I'm trying to run the parity experiment locally on my CPU:

python exps/parity.py --seq=20

But at Epoch 18 I get the error invalid on input:

Epoch 0 Test  Loss 1.3451 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.53it/s]
TESTING SET RESULTS: Average loss: 1.3595 Err: 0.5100
Epoch 1 Train Loss 0.6839 Err: 0.4100: 100%|██████████████████████████████| 90/90 [00:05<00:00, 17.56it/s]
Epoch 1 Test  Loss 0.7007 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  5.67it/s]
TESTING SET RESULTS: Average loss: 0.7005 Err: 0.5100
Epoch 2 Train Loss 0.6832 Err: 0.4100: 100%|██████████████████████████████| 90/90 [00:04<00:00, 18.22it/s]
Epoch 2 Test  Loss 0.7004 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  5.39it/s]
TESTING SET RESULTS: Average loss: 0.6999 Err: 0.5100
Epoch 3 Train Loss 0.6813 Err: 0.4100: 100%|██████████████████████████████| 90/90 [00:05<00:00, 17.29it/s]
Epoch 3 Test  Loss 0.7003 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.58it/s]
TESTING SET RESULTS: Average loss: 0.6994 Err: 0.5100
Epoch 4 Train Loss 0.6814 Err: 0.3900: 100%|██████████████████████████████| 90/90 [00:06<00:00, 14.00it/s]
Epoch 4 Test  Loss 0.7008 Err: 0.5120: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.53it/s]
TESTING SET RESULTS: Average loss: 0.7012 Err: 0.5140
Epoch 5 Train Loss 0.6823 Err: 0.4100: 100%|██████████████████████████████| 90/90 [00:06<00:00, 14.10it/s]
Epoch 5 Test  Loss 0.6999 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.49it/s]
TESTING SET RESULTS: Average loss: 0.6996 Err: 0.5100
Epoch 6 Train Loss 0.6763 Err: 0.3800: 100%|██████████████████████████████| 90/90 [00:06<00:00, 14.06it/s]
Epoch 6 Test  Loss 0.7024 Err: 0.5120: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.52it/s]
TESTING SET RESULTS: Average loss: 0.7028 Err: 0.5130
Epoch 7 Train Loss 0.6836 Err: 0.4100: 100%|██████████████████████████████| 90/90 [00:06<00:00, 13.74it/s]
Epoch 7 Test  Loss 0.6986 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.26it/s]
TESTING SET RESULTS: Average loss: 0.6986 Err: 0.5100
Epoch 8 Train Loss 0.6854 Err: 0.4100: 100%|██████████████████████████████| 90/90 [00:06<00:00, 13.82it/s]
Epoch 8 Test  Loss 0.6983 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.53it/s]
TESTING SET RESULTS: Average loss: 0.6979 Err: 0.5100
Epoch 9 Train Loss 0.6882 Err: 0.4500: 100%|██████████████████████████████| 90/90 [00:06<00:00, 13.97it/s]
Epoch 9 Test  Loss 0.6986 Err: 0.5060: 100%|████████████████████████████████| 2/2 [00:00<00:00,  4.12it/s]
TESTING SET RESULTS: Average loss: 0.6974 Err: 0.5100
Epoch 10 Train Loss 0.6878 Err: 0.4100: 100%|█████████████████████████████| 90/90 [00:06<00:00, 13.93it/s]
Epoch 10 Test  Loss 0.6985 Err: 0.5060: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.57it/s]
TESTING SET RESULTS: Average loss: 0.6970 Err: 0.5100
Epoch 11 Train Loss 0.6875 Err: 0.4100: 100%|█████████████████████████████| 90/90 [00:06<00:00, 14.09it/s]
Epoch 11 Test  Loss 0.6981 Err: 0.5060: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.25it/s]
TESTING SET RESULTS: Average loss: 0.6974 Err: 0.5100
Epoch 12 Train Loss 0.6830 Err: 0.3900: 100%|█████████████████████████████| 90/90 [00:06<00:00, 13.90it/s]
Epoch 12 Test  Loss 0.6983 Err: 0.5120: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.35it/s]
TESTING SET RESULTS: Average loss: 0.6988 Err: 0.5130
Epoch 13 Train Loss 0.6857 Err: 0.4100: 100%|█████████████████████████████| 90/90 [00:06<00:00, 13.26it/s]
Epoch 13 Test  Loss 0.6980 Err: 0.5060: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.44it/s]
TESTING SET RESULTS: Average loss: 0.6977 Err: 0.5100
Epoch 14 Train Loss 0.6796 Err: 0.4500: 100%|█████████████████████████████| 90/90 [00:06<00:00, 13.64it/s]
Epoch 14 Test  Loss 0.6982 Err: 0.4860: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.52it/s]
TESTING SET RESULTS: Average loss: 0.6989 Err: 0.5030
Epoch 15 Train Loss 0.6886 Err: 0.4800: 100%|█████████████████████████████| 90/90 [00:06<00:00, 13.89it/s]
Epoch 15 Test  Loss 0.6974 Err: 0.5060: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.44it/s]
TESTING SET RESULTS: Average loss: 0.6960 Err: 0.5100
Epoch 16 Train Loss 0.6856 Err: 0.4500: 100%|█████████████████████████████| 90/90 [00:06<00:00, 13.99it/s]
Epoch 16 Test  Loss 0.6979 Err: 0.5080: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.22it/s]
TESTING SET RESULTS: Average loss: 0.6997 Err: 0.5080
Epoch 17 Train Loss 0.6784 Err: 0.3800: 100%|█████████████████████████████| 90/90 [00:06<00:00, 14.12it/s]
Epoch 17 Test  Loss 0.7000 Err: 0.5120: 100%|███████████████████████████████| 2/2 [00:00<00:00,  4.32it/s]
TESTING SET RESULTS: Average loss: 0.7011 Err: 0.5130
Epoch 18 Train Loss 0.0310 Err: 0.0000:  49%|██████████████▏              | 44/90 [00:03<00:04, 11.23it/s]invalid on input
invalid on input
invalid on input
invalid on input
invalid on input
invalid on input
Epoch 18 Train Loss 0.0234 Err: 0.0000:  51%|██████████████▊              | 46/90 [00:03<00:03, 11.16it/s]invalid on input
invalid on input
invalid on input
[...]

What could be wrong?

xflash96 commented 5 years ago

Sorry for the delayed reply. The "invalid on input" warning (satnet_cpp:194) means that there are Nan or Inf in the gradient, which didn't happen during our tests. Could you describe your environment (CPU spec, numpy/pytorch version) for generating the bug?

fdietze commented 4 years ago

No worries, sorry for my late reply now :)

Didn't find the time yet to try again. I'll report back when I do.

fdietze commented 4 years ago

So I found the time to try again. Still the same problem, but at a later epoch.

Manjaro Linux, Linux 5.3.18-1 (Running in Virtualbox) CPU: Intel i7-8550U Python 3.8.1 numpy 1.18.0 torch 1.3.1

Tell me if you need more information.

Thanks for your help!

xflash96 commented 3 years ago

Sorry for the late update. I've updated the APIs to work with Pytorch:1.7.0. Also, I fixed the bug on the CPU version. May you confirm that it also works on your side?

fdietze commented 3 years ago

Thank you for the update. I'll report back, when I try again.