lezcano / geotorch

Constrained optimization toolkit for PyTorch
https://geotorch.readthedocs.io
MIT License
656 stars 34 forks source link

Does geotorch.orthogonal is a mapping from Euclidean space onto the corresponding manifold space? #43

Closed Ronales closed 7 months ago

Ronales commented 7 months ago

Hello, lezcano, I am a beginner in the field of manifolds. I am glad to see your open-source implementation code about manifold structures. I have the following questions to ask you: Question 1. I have reviewed your two papers and found that both of them apply orthogonalization to linear layers in code implementation. If our intermediate output value is X, is the output result also on the corresponding manifold after passing through such an orthogonal constrained linear layer? Finally, get Y on manifold.

For example, the following code: https://github.com/lezcano/geotorch/blob/ba38d406c245d609fee4b4dac3f6427bf6d73a8e/examples/sequential_mnist.py#L114

Can I assume that geotorch.orthogonal provides a mapping that allows it to project vectors from Euclidean space onto the corresponding manifold space. X ->Y?

Question 2. I see that you are using geotorch.orthogonal to apply in RNN networks. Can it be used for classification tasks in a regular multi-layer perceptron network? If the geotorch.orthogonality in question 1 is a projection operation, should special operations be performed on the projected values? (Because at this point it is already in manifold space). I see that you used a special nonlinearity layer in the ExpRNNCell class. Is its purpose to project points located on the manifold back into Euclidean space?

Code location: https://github.com/lezcano/geotorch/blob/ba38d406c245d609fee4b4dac3f6427bf6d73a8e/examples/sequential_mnist.py#L116

Because you are using an RNN network, you are using a nonlinearity layer for the conversion. If I am using a regular multi-layer perceptron network for classification tasks, how can I project the parameterized output values of geotorch.orthogonality onto the Euclidean space?

Question 3: Because I want to project the eigenvectors of Euclidean space (such as the output of a simple linear layer) onto a manifold, and then calculate the distance between different output features on the manifold. May I ask if you have any suggestions? Can we directly use some metric methods in Euclidean space to measure the distance between two different Y value.

Thank you very much for answering these questions, and thank you for opening up such an excellent project!

lezcano commented 7 months ago
  1. What do you mean by "project vectors from the Euclidean space into a manifold"? Into which manifold?
  2. You can use GeoTorch in any network you want, RNN or otherwise.
  3. I think you have confused what is considered in a manifold here. This library allows to put constraints on parameters of an NN. It does not constrain inputs, outputs or intermediary values of the computations of the network. Just the parameters.
Ronales commented 7 months ago
  1. What do you mean by "project vectors from the Euclidean space into a manifold"? Into which manifold?
  2. You can use GeoTorch in any network you want, RNN or otherwise.
  3. I think you have confused what is considered in a manifold here. This library allows to put constraints on parameters of an NN. It does not constrain inputs, outputs or intermediary values of the computations of the network. Just the parameters.

Thank you very much for your prompt reply. I will explain your questions

1. "Project vectors from the Euclidean space into a manifold" means that if I have a vector output from the intermediate layer of a neural network, I want to map the output features to the manifold space (such as a stiefel manifold), and then process these features on the manifold. For example, in the Minist handwriting recognition classification task, I want to map the features of the fully connected layer of Minist handwriting recognition from the traditional Euclidean space to the manifold space, and then obtain the final classification result. It would be even better if you could give me some ideas on this issue.

2 and 3: Based on your answer, can I assume that the function of geotorch libraries or similar orthogonal neural network weight parameters is only to improve the generalization performance of the network and reduce the risk of gradient explosion and disappearance?

This is my confusion: 4. But I have read some articles: https://arxiv.org/pdf/1702.00071.pdf . The orthogonalization constraints on neural network weights can be achieved using the following methods 4.1 Forced orthogonality: For example, performing singular value decomposition on weight parameters during training to enforce their orthogonality. 4.2 Soft orthogonality: For example, when using regularizers with penalty terms, encourage their orthogonality. For example: min f (x)+|| W.T@W -I||

The geotorch method seems to be the same approach as the 4.1 method, which involves parameterization to constrain weight parameter orthogonality. Is the purpose also to improve the generalization performance of the network and reduce the risk of gradient explosion and disappearance? Can I use method 4.2 to have the same effect as method 4.1, but the final accuracy may differ?

5. The weight parameters of neural networks are constantly updated and optimized during the training process (based on different parameter optimizers). Can I understand that this orthogonalization of weight parameters is actually optimizing layer weight parameters by placing them on the corresponding manifold (such as a stiefel) and updating these weight parameters according to the manifold optimization method?

Thank you again for your patient answer.

lezcano commented 7 months ago
  1. This library does not work with features. I think you are misunderstanding what this library does, and what lies on the manifold. Again, this library constraints parameters, not outputs of layers. 2-3 It'ss for whatever you want to use it for. This library does not limit what you can use it with.
  2. Soft vs forced orthogonality. You should look into the literature on that topic, and run experiments yourself see if it works for your problem. Both are different approach to the same issue.
  3. That's exactly what geotorch is doing, yes.

In general, if you have doubts, I would recommend you look at the internal implementation of geotorch to understand better how it works.

Ronales commented 7 months ago

Thank you very much for your patient answer. I understand now.