For the line here: https://github.com/microsoft/cliffordlayers/blob/9248979d747d13ec550282e2325a626484cfd753/cliffordlayers/nn/functional/batchnorm.py#L67
In my case, the scale of the feature matrix is up to 1e10 (In my opinion this is a problem but discuss later). When multiply X^T with X in a batched way: cov = torch.matmul(X, X.transpose(-1, -2)) / X.shape[-1], the covariance matrix might be negative-definite. But if I extract the problematic matrix from this batch and multiply them, the matrix is positive-definite (line 70: 'U = torch.linalg.cholesky(cov + eye).mH' can run). So in order to prevent the calculational instability, I modify the eye matrix as follows:
choose the max value of each batch and generate diag matrix with maximum value as diagonal values of each 'batch', named max_values
multiply max_values by 1e-5 (parameter eps) and add them to cov matrix
perform cholesky decomposition
Rationality of choosing the max value as the diagnal value might need to be discussed. Here's the small modification. Thank you for your brilliant work!
For the line here: https://github.com/microsoft/cliffordlayers/blob/9248979d747d13ec550282e2325a626484cfd753/cliffordlayers/nn/functional/batchnorm.py#L67 In my case, the scale of the feature matrix is up to 1e10 (In my opinion this is a problem but discuss later). When multiply X^T with X in a batched way:
cov = torch.matmul(X, X.transpose(-1, -2)) / X.shape[-1]
, the covariance matrix might be negative-definite. But if I extract the problematic matrix from this batch and multiply them, the matrix is positive-definite (line 70: 'U = torch.linalg.cholesky(cov + eye).mH' can run). So in order to prevent the calculational instability, I modify the eye matrix as follows:max_values
max_values
by 1e-5 (parametereps
) and add them tocov
matrixRationality of choosing the max value as the diagnal value might need to be discussed. Here's the small modification. Thank you for your brilliant work!