Closed Silk760 closed 2 years ago
It is really hard to debug code based on this small snippet. The layer could be uninitialised, the learning rate could be wrong, etc. I recommend you explicitly initialise your layer with the desired variance.
In your code, why do you flatten the input if you want to use a TRL? You're essentially just using a matrix factorisation in that way. I'll upload an example notebook when I get a moment.
Closing as inactive, feel free to reopen if you still have the issue.
An example use of TRL in a network would be good because it is proving non-trivial to use this thing easily. Any idea where this is on the priority list?
I am trying to use tensor regression layers instead of the fully connected layer in my model, the paper claim I can just replace the fully connected layer which removed the need to flatten the tensor and keep the spatial information which increase the model accuracy.
I did this with small network
` import torch.nn as nn import torch.nn.functional as F import torch
class Net(nn.Module):
`
training the model the loss is always nan. If there any example shows how to use tensor regression layer in tensor contraction layer in the state of the art models will be very helpful.