mlech26l / ncps

PyTorch and TensorFlow implementation of NCP, LTC, and CfC wired neural models
https://www.nature.com/articles/s42256-020-00237-3
Apache License 2.0
1.86k stars 298 forks source link

pt_example cant be run using GPU #13

Closed lk1983823 closed 3 years ago

lk1983823 commented 3 years ago

I want to run the example using gpu, so I set the parameter in pt_example.py of: trainer = pl.Trainer( logger=pl.loggers.CSVLogger("log"), max_epochs=400, progress_bar_refresh_rate=1, gradient_clip_val=1, # Clip gradient to stabilize training gpus=1 ) However, it shows the errors as follows:

GPU available: True, used: True
TPU available: None, using: 0 TPU cores
/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:50: UserWarning: you defined a validation_step but have no val_dataloader. Skipping validation loop
  warnings.warn(*args, **kwargs)

  | Name  | Type        | Params
--------------------------------------
0 | model | RNNSequence | 350   
--------------------------------------
350       Trainable params
0         Non-trainable params
350       Total params
0.001     Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "pt_example.py", line 131, in <module>
    trainer.fit(learn, dataloader)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 513, in fit
    self.dispatch()
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in dispatch
    self.accelerator.start_training(self)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 111, in start_training
    self._results = trainer.run_train()
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 644, in run_train
    self.train_loop.run_training_epoch()
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 492, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 650, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
    using_lbfgs=is_lbfgs,
  File "pt_example.py", line 79, in optimizer_step
    optimizer.optimizer.step(closure=closure)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/torch/optim/adam.py", line 66, in step
    loss = closure()
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 645, in train_step_and_backward_closure
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 293, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 157, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 122, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "pt_example.py", line 46, in training_step
    y_hat = self.model.forward(x)
  File "pt_example.py", line 32, in forward
    new_output, hidden_state = self.rnn_cell.forward(inputs, hidden_state)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/kerasncp/torch/ltc_cell.py", line 255, in forward
    next_state = self._ode_solver(inputs, states, elapsed_time)
  File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/p36env/lib/python3.6/site-packages/kerasncp/torch/ltc_cell.py", line 186, in _ode_solver
    sensory_w_activation *= self._params["sensory_sparsity_mask"]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

When I changed the wiring into

wiring = kncp.wirings.NCP(
    inter_neurons=12,  # Number of inter neurons
    command_neurons=8,  # Number of command neurons
    motor_neurons=out_features,  # Number of motor neurons
    sensory_fanout=4,  # How many outgoing synapses has each sensory neuron
    inter_fanout=4,  # How many outgoing synapses has each inter neuron
    recurrent_command_synapses=4,  # Now many recurrent synapses are in the
    # command neuron layer
    motor_fanin=6,  # How many incomming syanpses has each motor neuron
)

The error still exists. Thanks!

mlech26l commented 3 years ago

Thanks for the report. Fixed with version 2.0.1

lucifer2859 commented 3 years ago

pip freeze # keras-ncp==2.0.1 print(kncp.version) # 2.0.0

I still have this problem as I install version 2.0.1, though it prints out '2.0.0'

mlech26l commented 3 years ago

With 2.0.1, examples/pt_example.py should be able to run fine. What error do you get?

lucifer2859 commented 3 years ago

With 2.0.1, examples/pt_example.py should be able to run fine. What error do you get?

I find it's right, thx