Closed renjithravindran closed 2 years ago
As it stands, alas, that is expected. Now it'd be possible to get around that by using the class used within https://pytorch.org/docs/master/generated/torch.nn.utils.parametrizations.orthogonal.html
In this one, there's the orthogonal_map="householder"
option that should be notably more efficient for very tall matrices like yours. In particular, you should use the _Orthogonal
class within https://pytorch.org/docs/master/_modules/torch/nn/utils/parametrizations.html#orthogonal
in place of geotorch.SO
together with the option use_trivialization=False
. That may work much better for your problem, as all the other options would instantiate a matrix of size 30000 x 50
and that one wouldn't.
I'd recommend you monkey-patch your way to victory here. This would mean: take the LowRank
class for example and overwrite the static method def manifolds(n, k, rank, tensorial_size, triv):
with one that returns two torch.nn.utils.parametrizations._Orthgonal
s rather than Stiefel
. I'd reckon that should do (modulo perhaps monkey-patching the _Orthogonal
class with a couple extra methods.
again thanks for your lightning fast responses! So you suggest to make this work I use the _Orthogonal class from ..utils.parametrizations.. instead geotorch.SO with map=householder and trivialization=False. This much is clear!
But I dont understand the next steps, how is the LowRank class associated with what I am trying to do?
As of now I have a fair intuitions about the math behind these, so I think i can monkey-patch as required. However, do you think with right modifications I should be able to have reasonable computational costs for the size of matrices i am interested in?
thanks a lot!
Also with these modifications, will things more or less look like the technique described here?
I meant LowRank
as an example of a class that does some SVD-like factorisation. I figured you were using one from geotorch.
If you're using the Stiefel
class within your own class, then things are much simpler. Simply replace it with the _Orthogonal
class from PyTorch and you should be good.
About that paper, in some sense, yes. That paper has a number of years, and I'm not 100% sure that their implementation at the time was correct. Now, I'm pretty certain that the implementation in _Orthogonal
is correct, and it should be fairly efficient, as it uses cuBLAS behind the scenes.
Okey, actually I am trying to to do SVD with gradient descent. But what I am really trying to do is Tucker decomposition, SVD is only a first step.
Let me try out your suggestions.
Thanks!!
Any news?
Hi Lezcano Glad you asked. I haven't got in to trying what you suggested. I broke the work that I was doing into two parts, one that could use more classical way (HOOI) of doing tucker and the other using gradients. And have been busy with the first part. But mean while I did try using an implementation of the householder matrix technique for orthogonality, but that also gave me an OOM .
Thanks
That technique is not the same as the one implemented in parametrizations.orthogonal in core PyTorch. I very much encourage you to use the one in core PyTorch, see if it works for your use case.
Yes, i do intend to try it out! Thanks
On Sun, 17 Apr 2022, 5:56 am Lezcano, @.***> wrote:
That technique is not the same as the one implemented in parametrizations.orthogonal https://pytorch.org/docs/master/generated/torch.nn.utils.parametrizations.orthogonal.html?highlight=orthogonal#torch.nn.utils.parametrizations.orthogonal in core PyTorch. I very much encourage you to use the one in core PyTorch, see if it works for your use case.
— Reply to this email directly, view it on GitHub https://github.com/Lezcano/geotorch/issues/30#issuecomment-1100776809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5EKAVGDPFBC7XJZNHZWLLVFNLB3ANCNFSM5LV2E7QA . You are receiving this because you authored the thread.Message ID: @.***>
any news on this end?
Closing for now as this is expected. We should consider adding the householder
parametrisation from core and just roll with that one.
Hi, I was testing geotorch to do some SVD. unfortunately registering orthogonal parametrization on a large embedding layer (30000x50) takes around 20 mins. and gets killed when the training starts. FYI: this is on pytorch(1.10.1) CPU
Is there anything that can change this?
Thanks