ixaxaar / pytorch-dnc

Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch
MIT License
335 stars 56 forks source link

pytorch LTS support (1.8.2) or stable (1.11.1) #60

Open ziegenbalg opened 2 years ago

ziegenbalg commented 2 years ago

Hello!

I was wondering if someone can confirm that this package still runs under pytroch lts or current stable (1.11.1)?

I'm getting a curious error. Note this is for CPU training. Maybe someone can confirm this is only broken under cpu training.

Thank you!

`03:44 $ python ./tasks/adding_task.py -lr 0.0001 -rnn_type lstm -memory_type sam -nlayer 1 -nhlayer 1 -nhid 100 -dropout 0 -mem_slot 1000 -mem_size 32 -read_heads 1 -sparse_reads 4 -batch_size 20 -optim rmsprop -input_size 3 -sequence_max_length 100 Namespace(batch_size=20, check_freq=100, clip=50, cuda=-1, dropout=0.0, input_size=3, iterations=2000, lr=0.0001, mem_size=32, mem_slot=1000, memory_type='sam', nhid=100, nhlayer=1, nlayer=1, optim='rmsprop', read_heads=1, rnn_type='lstm', sequence_max_length=100, sparse_reads=4, summarize_freq=100, temporal_reads=2, visdom=False) Using CPU.


SAM(3, 100, num_hidden_layers=1, nr_cells=1000, read_heads=1, cell_size=32) SAM( (lstm_layer_0): LSTM(35, 100, batch_first=True) (rnn_layer_memory_shared): SparseMemory( (interface_weights): Linear(in_features=100, out_features=70, bias=True) ) (output): Linear(in_features=132, out_features=3, bias=True) )

Iteration 0/2000 Falling back to FLANN (CPU). For using faster, GPU based indexes, install FAISS: "conda install faiss-gpu -c pytorch" Traceback (most recent call last): File "./tasks/adding_task.py", line 222, in loss.backward() File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1000]], which is output 0 of AsStridedBackward, is at version 70; expected version 69 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

^C (default) ✘-INT ~/pytorch-dnc [master|✚ 2] 03:45 $ `

ixaxaar commented 2 years ago

AFAIK I did not really use the cpu training except while some testing. Anyway, I need to update pytorch support, lemme look at it this weekend.

ziegenbalg commented 2 years ago

Will report back here once I can confirm gpu training still works. Setting up env for LTS and 1.11.1 this week.

ziegenbalg commented 2 years ago

Still broker with gpu I think?

(default) [eziegenbalg@localhost-live pytorch-dnc]$ python ./tasks/adding_task.py -cuda 0 -lr 0.0001 -rnn_type lstm -memory_type sam -nlayer 1 -nhlayer 1 -nhid 100 -dropout 0 -mem_slot 1000 -mem_size 32 -read_heads 1 -sparse_reads 4 -batch_size 20 -optim rmsprop -input_size 3 -sequence_max_length 100 Namespace(batch_size=20, check_freq=100, clip=50, cuda=0, dropout=0.0, input_size=3, iterations=2000, lr=0.0001, mem_size=32, mem_slot=1000, memory_type='sam', nhid=100, nhlayer=1, nlayer=1, optim='rmsprop', read_heads=1, rnn_type='lstm', sequence_max_length=100, sparse_reads=4, summarize_freq=100, temporal_reads=2, visdom=False) Using CUDA.


SAM(3, 100, num_hidden_layers=1, nr_cells=1000, read_heads=1, cell_size=32, gpu_id=0) SAM( (lstm_layer_0): LSTM(35, 100, batch_first=True) (rnn_layer_memory_shared): SparseMemory( (interface_weights): Linear(in_features=100, out_features=70, bias=True) ) (output): Linear(in_features=132, out_features=3, bias=True) )

Iteration 0/2000 Falling back to FLANN (CPU). For using faster, GPU based indexes, install FAISS: conda install faiss-gpu -c pytorch Traceback (most recent call last): File "./tasks/adding_task.py", line 222, in loss.backward() File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1000]], which is output 0 of ScatterBackward0, is at version 57; expected version 56 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). (default) [eziegenbalg@localhost-live pytorch-dnc]$ conda install faiss-gpu -c pytorch Collecting package metadata (current_repodata.json): done Solving environment: done

All requested packages already installed.

(default) [eziegenbalg@localhost-live pytorch-dnc]$ nvidia-smi Sun Jun 12 10:19:06 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:00:0C.0 Off | N/A | | 32% 32C P8 14W / 215W | 1MiB / 8192MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ (default) [eziegenbalg@localhost-live pytorch-dnc]$ cat /etc/os-release NAME="Fedora Linux" VERSION="36 (Workstation Edition)" ID=fedora VERSION_ID=36 VERSION_CODENAME="" PLATFORM_ID="platform:f36" PRETTY_NAME="Fedora Linux 36 (Workstation Edition)" ANSI_COLOR="0;38;2;60;110;180" LOGO=fedora-logo-icon CPE_NAME="cpe:/o:fedoraproject:fedora:36" HOME_URL="https://fedoraproject.org/" DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/" SUPPORT_URL="https://ask.fedoraproject.org/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Fedora" REDHAT_BUGZILLA_PRODUCT_VERSION=36 REDHAT_SUPPORT_PRODUCT="Fedora" REDHAT_SUPPORT_PRODUCT_VERSION=36 PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy" VARIANT="Workstation Edition" VARIANT_ID=workstation (default) [eziegenbalg@localhost-live pytorch-dnc]$

ziegenbalg commented 2 years ago

@ixaxaar have you had a chance to see if this works under the new pytorch LTS version?

Marchetz commented 2 years ago

Hi, I continue this issue to ask the same thing. In these days, I was trying to use SDNC and SAM architecture with GPU setting but I have many problems with FAISS and with related libraries and packages. Instead, DNC model works perfectly. I think that I have installed all the necessary package. I would like to know if these two archictetures support the new pytorch version. If everything works, it means that I'm wrong something during the installation process.

Thank you for the repository!!