lava-nc / lava-dl

Deep Learning library for Lava
https://lava-nc.org
BSD 3-Clause "New" or "Revised" License
149 stars 71 forks source link

error while using Recurrent block in lava-dl #271

Open franzhd opened 9 months ago

franzhd commented 9 months ago

Describe the bug I'm trying to train recurrent neurons, but i'm not a ble due to some errors in the code. i'm posting down the code of the net itself. To reproduce current behavior Steps to reproduce the behavior:

  1. When I run this code (add code or minimum test case) ...

    
    class Network(torch.nn.Module):
    def __init__(self, input, output, treshold, voltage_decay):
        super(Network, self).__init__()
    
        neuron_params = {
                'threshold'     : treshold,
                'current_decay' : 1,               
                'voltage_decay' : voltage_decay,
                'tau_grad'      : 1,
                'scale_grad'    : 1
            }
        neuron_params_drop = {**neuron_params, 'dropout' : slayer.neuron.Dropout(p=0.1),}
    
        self.blocks = torch.nn.ModuleList([
    
                slayer.block.cuba.Dense(neuron_params_drop, input, 128, weight_norm=True, delay=True),
                slayer.block.cuba.Recurrent(neuron_params_drop, 128,256,weight_norm=True,  delay=False),
                slayer.block.cuba.Recurrent(neuron_params_drop, 256,256, weight_norm=True, delay=False),
                slayer.block.cuba.Dense(neuron_params, 256, 128,  weight_norm=True, delay=True),
                slayer.block.cuba.Dense(neuron_params, 128, output)
            ])
    
    def forward(self, x):
        count = []
        event_cost = 0
    
            # forward computation is as simple as calling the blocks in a loop
        x = self.blocks[0](x)
        x = self.blocks[1](x)
        x = self.blocks[2](x)
        x = self.blocks[3](x)
        x = self.blocks[4](x)
            # if hasattr(block, 'neuron'):
            #     event_cost += event_rate_loss(x)
            #     count.append(torch.sum(torch.abs((x[..., 1:]) > 0).to(x.dtype)).item())
    
        return x #, event_cost , torch.FloatTensor(count).reshape((1, -1)).to(x.device)
    
    def grad_flow(self, path):
        # helps monitor the gradient flow
        grad = [b.synapse.grad_norm for b in self.blocks if hasattr(b, 'synapse')]
    
        plt.figure()
        plt.semilogy(grad)
        plt.savefig(path + 'gradFlow.png')
        plt.close()
    
        return grad
    
    def export_hdf5(self, filename):
        # network export to hdf5 format
        h = h5py.File(filename, 'w')
        layer = h.create_group('layer')
        for i, b in enumerate(self.blocks):
            b.export_hdf5(layer.create_group(f'{i}'))
2. I get this error ...

Traceback (most recent call last): File "/root/lava-dl_experiment/src/experiment13.py", line 194, in main() File "/root/lava-dl_experiment/src/experiment13.py", line 152, in main output = assistant.train(input.to(device), label) File "/opt/conda/lib/python3.10/site-packages/lava/lib/dl/slayer/utils/assistant.py", line 121, in train output = self.net(input) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/lava-dl_experiment/src/experiment13.py", line 56, in forward x = self.blocks1 File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/lava/lib/dl/slayer/block/base.py", line 1476, in forward x = recurrent.custom_recurrent(z, self.neuron, self.recurrent_synapse) File "/opt/conda/lib/python3.10/site-packages/lava/lib/dl/slayer/utils/recurrent.py", line 42, in custom_recurrent return CustomRecurrent.apply(z, neuron, recurrent_mat) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/opt/conda/lib/python3.10/site-packages/lava/lib/dl/slayer/utils/recurrent.py", line 64, in forward feedback = torch.matmul(spike[..., 0], recurrent_mat_T) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)



**Expected behavior**

**Screenshots**
If applicable, add screenshots to help explain your problem. Remove section otherwise.

**Environment (please complete the following information):**
 - docker https://github.com/franzhd/lava-dl_docker
 - Lava version  0.9.0
 - lava-dl version 0.5.0

**Additional context**
Add any other context about the problem here. Remove section otherwise.