marlin-codes / hyperbolicTransformer

Pytorch implementation for Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space (KDD 2024)
MIT License
4 stars 0 forks source link

Question Regarding Implementation of Hyperbolic Operations in Hypformer Codebase #1

Closed github202308 closed 4 weeks ago

github202308 commented 2 months ago

Hello, in your paper “Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space,” you mentioned that you implemented operations such as the linear transformation layer, LayerNorm layer, activation functions, and Dropout in hyperbolic space. However, why can’t I find the specific operations in your code under Hypformer/manifolds/layer.py? It still seems to be operations in Euclidean space. Could you explain this?

marlin-codes commented 2 months ago

Thank you for your question regarding the implementation.

At first glance, the operations in the code under Hypformer/manifolds/layer.py may appear to resemble their Euclidean counterparts. However, these operations are fully implemented in hyperbolic space, specifically utilizing the Lorentz model of hyperbolic geometry.

The key difference lies in how we redefine common Transformer components entirely within hyperbolic space using two foundational blocks: Hyperbolic Transformation with Curvatures (HTC) and Hyperbolic Readjustment and Refinement with Curvatures (HRC). These blocks allow us to handle transformations, LayerNorm, activations, and Dropout while respecting the constraints of hyperbolic geometry.

The core idea is to decompose a hyperbolic vector x into its time-like (x_time) and space-like (x_space) components. Operations such as LayerNorm, Dropout, and activations are applied to the space-like part, after which the time-like component is recalculated to ensure the entire vector remains valid within the hyperbolic manifold. Let’s break this process down step-by-step with an example:


Hyperbolic Vector Representation: Given a vector x = [x_time, x_space], for example:

Hyperbolic LayerNorm:

  1. Apply LayerNorm to the space-like components: x_space: LayerNorm([0.6, 0.8]) -> [0.5, 0.7]

  2. Recalculate the time-like component using the formula: x_time = sqrt(||x_space||^2 + k) where k is the curvature of the hyperbolic space: x_time: sqrt((0.5^2 + 0.7^2) + 1) ≈ 1.378

  3. Combine the updated components: x = [x_time, x_space] -> [1.378, 0.5, 0.7]

  4. If transitioning between hyperbolic spaces with different curvatures k1 and k2, apply a curvature adjustment: x = x * sqrt(k2 / k1)

Thus, after transformation, the original hyperbolic vector [1.414, 0.6, 0.8] becomes [1.378, 0.5, 0.7].


In the code, these operations are reflected in key classes such as:

It seems that these operations may resemble their Euclidean counterparts at first glance. However, the underlying math is distinct, and we manage operations like the Lorentzian inner product and curvature adjustments to ensure the geometry of the hyperbolic space is preserved throughout.

github202308 commented 2 months ago

Thank you for your detailed and insightful explanation regarding the implementation of hyperbolic operations in the “Hypformer” code, specifically under Hypformer/manifolds/layer.py.

I have carefully reviewed your response and the provided code. I now understand that the operations, especially LayerNorm, activations, and Dropout, are indeed applied to the space-like components of the hyperbolic vectors, while ensuring that the time-like component is recalculated appropriately to maintain the validity within the hyperbolic manifold.

However, I would like to clarify one specific point to ensure my complete understanding:

When extracting the space-like component x_spacefrom a hyperbolic feature vector x, it is interpreted as being in Euclidean space, correct? Thus, directly applying LayerNorm (an Euclidean operation) to x_spaceis reasonable because x_space behaves as an Euclidean vector in this context. Following this, recalculating the time-like component ensures the resulting vector remains valid within the hyperbolic space.

Your insight on this particular aspect would be greatly appreciated to confirm my understanding.

Thank you once again for your support and assistance.

marlin-codes commented 1 month ago

Thanks for your question!

1. Applying Transformations Directly to $x_{\text{space}}$ $\to$ special Lorentz Rotation

While we can split the space-like component $x{\text{space}}$ from the feature vector, this component is not interpreted as being in Euclidean space. Directly transforming $x{\text{space}}$ actually corresponds to performing a special Lorentz rotation on the original hyperbolic vector, not a simple Euclidean operation.

The updated vector can be expressed as:

$$x{\text{new}} = \left(\sqrt{|f(x{\text{space}})|^2 + K}, f(x_{\text{space}})\right),$$

where the curvature $K$ is related to the hyperbolic geometry (with curvature $=-1/K$).

This leads to the following matrix formulation:

Here, the right-hand side represents the hyperbolic vector $x$, while the left-hand matrix represents a specific Lorentz group operation. Essentially, applying transformations directly to $x_{\text{space}}$ equates to performing a special Lorentz rotation on the hyperbolic vector (if we discard the orthogonal constraint). This is defined as follows:


Lorentz Rotation

A Lorentz rotation describes a spatial coordinate rotation. The Lorentz rotation matrices are expressed as:

where $\tilde{\mathbf{R}}$ satisfies:

$$ \tilde{\mathbf{R}}^T \tilde{\mathbf{R}} = \mathbf{I}, \quad \det(\tilde{\mathbf{R}}) = 1, $$

indicating that $\tilde{\mathbf{R}} \in SO(n)$, meaning it is a special orthogonal matrix.

Lorentz Boost

In contrast, Lorentz boosts describe relative motion with constant velocity. Given a velocity:

$$\mathbf{v} \in \mathbb{R}^n \quad (\text{in relation to the speed of light}),$$

with the condition $|\mathbf{v}| < 1$ and the Lorentz factor $\gamma$ defined as:

$$\gamma = \frac{1}{\sqrt{1 - |\mathbf{v}|^2}},$$

the Lorentz boost matrices are given by:

$$ \mathbf{B} = \begin{bmatrix} \gamma & -\gamma \mathbf{v}^T \ -\gamma \mathbf{v} & \mathbf{I} + \frac{\gamma^2}{1 + \gamma} \mathbf{v} \mathbf{v}^T \end{bmatrix}. $$

In Lorentz boosts, the off-diagonal (anti-diagonal) terms reflect the interaction between time and space components. In contrast, Lorentz rotations maintain zero off-diagonal terms, ensuring no mixing between time and space.


2. Why Use Rotations for ReLU/LayerNorm/Dropout/Concatenation Rather Than Boosts?

Relativity View

In the framework of Special Relativity:

In our context, we apply rotations, not boosts, because the operations (e.g., ReLU, LayerNorm, Dropout, Concatenation) are all performed within the same time reference frame. Since there is no relative motion before and after the operation, it is natural to use rotations that preserve the separation of space and time components.

Computation view

For a 5-dimensional Lorentz vector in hyperbolic space with a curvature of (-1), an example of a vector could be:

x = (1.02, 0.1, 0.1, 0.1, 0.1, 0.1)

The first time-like view is much larger the space-like value

If we applied LayerNorm directly across all components, the time-like component (1.02) would be normalized alongside much smaller space-like components (e.g., 0.1). This discrepancy in scale could lead to instability.

github202308 commented 1 month ago

Thank you very much for your detailed and thorough explanation!

Your explanation has helped me better understand how various operations are implemented in hyperbolic space within Hypformer, especially the parts regarding handling ReLU, LayerNorm, Dropout, and concatenationthrough Lorentz rotations. Your patient response has provided me with a clearer understanding of the project's implementation details.

Once again, thank you for your support and assistance!

Wishing you continued success and smooth progress on your projects!