Closed kurtamohler closed 3 years ago
cc @mruberry @ngimel
I like the idea of supporting a dims argument, but let me ask a more general question:
Currently torch.norm and torch.linalg.norm support a variety of norms computed with radically different methods. They almost seem like "clearing houses" or "routers" for picking a norm. Would it be simpler if we implemented the vector norms and the matrix norms that comprise torch.norm and torch.linalg.norm as separate functions, and then torch.linalg.norm and torch.linalg.vector_norm can select among these functions?
@mruberry, your suggestion makes sense. torch.linalg.norm
could just simply route directly to at::linalg_vector_norm
(the at:: implementation of torch.linalg.vector_norm
) when its vector norm conditions are met. This would require at::linalg_vector_norm
to offer a dim
argument, since torch.linalg.norm
offers the ability to calculate vector norm across just one dimension of an n-dimensional input.
I'm also now thinking that torch.linalg
's norm methods should probably not rely on at::norm
internally (although I think relying on norm_stub
is still alright, since that would be a lot of code to duplicate). That way, it would be simpler to change at::norm
without affecting at::linalg_vector_norm
, and vice versa. Futhermore, when we replace torch.norm
with torch.linalg.norm
or torch.linalg.vector_norm
, we could also think about also replacing at::norm
calls with a at::linalg_vector_norm
call, and then delete at::norm
.
Related: torch.norm
and torch.linalg.norm
give different results for some degenerate tensors and extreme values. I will need to find out if any internal calls rely on torch.norm
's behavior in these edge cases. If they do, it's most likely not a good idea to simply replace them, we'd probably need a deprecation strategy.
It's true that linalg.norm could route to linalg.vector_norm when it was computing a vector norm. But I was thinking even more fine-grained like: _matrix_two_norm, _matrix_neg_two_norm, _vector_zero_norm ... These functions don't need to be exposed to users. This might be more readable.
All vector norms and frobenius norm are implemented in the same way, via reduction. For matrix norm indeed splitting into separate functions might make sense, but even with that I'd leave +/-inf and +/-one matrix norm in the same function, they share a lot of implementation. That leaves nuclear norm and +/-2-norms that could also live together because they both rely on first computing svd. Question: why is frobenius norm implemented as
if (self.is_complex()){
result_ = at::sqrt(at::sum(at::real(self.conj() * self), dim_, keepdim));
} else {
result_ = at::sqrt(at::sum((self * self), dim_, keepdim));
}
? This is super inefficient compared to vector norm, and produces the same result.
Got it, that makes sense.
Although I'm not too sure if we should separate the vector norm orders into _vector_zero_norm
and so on, because that would make it necessary to duplicate the logic for selecting the appropriate vector norm order in both at::linalg_norm
and at::linalg_vector_norm
. Would it make more sense to just have a _vector_norm
to encapsulate this logic?
EDIT: I posted the above paragraph before my page reloaded with Natalia's comment, which mentions this more concisely.
Also, if I understand correctly, if we use this fine-grained approach, at::linalg_norm
would need its own specialized entry in derivatives.yaml
, in addition to the derivative formula for at::linalg_vector_norm
. If, instead, at::linalg_norm
called at::linalg_vector_norm
, the derivative for at::linalg_norm
in the vector case would be automatically traced to that of at::linalg_vector_norm
. The fine-grained approach is alright with me if the pros outweigh the cons, I'm just pointing out one of the possible cons.
@ngimel, you're right, frobenius norm is the same as a vector 2-norm, so it should share the same implementation as vector norm.
I guess the way I'm thinking of it torch.linalg.norm would either call torch.linalg.vector_norm or a matrix norm impl, and the latter things would have derivatives, so torch.linalg.norm would be a composite operation and not need a derivatives.yaml entry. We can VC through the details sometime, but I think we're all happy to proceed with torch.linalg.vector_norm.
Would there be a separate norm functions for matrix_norm
?
Update!
From offline discussion with @kurtamohler and @ngimel, for the 1.9 release we plan to:
In the feature we may even want to consider warning people away from torch.linalg.norm so they use one of these two functions, instead. From our discussions with other engineers, it seems that most uses cases are either of vector norms or matrix norms and not a mix of the two, and their conflation in torch.norm and torch.linalg.norm may now be familiar, but is still often confusing for users.
🚀 Feature
Add a function called
torch.linalg.vector_norm
, which computes the vector norm of an input with more than one dimension.Motivation
torch.norm
has the ability to compute vector norms of n-dimensional inputs, buttorch.linalg.norm
does not. In order to deprecatetorch.norm
(issue #49907), we need to offer the missing functionality. Originally, the idea was to add a flag totorch.linalg.norm
that offers n-dimensional vector norm support, but that idea was determined to be bad UX design, since the new argument would be incompatible with the behavior of thedim
argument's behavior intorch.linalg.norm
. Instead, it makes more sense to implement a separatetorch.linalg.vector_norm
function.Pitch
torch.linalg.vector_norm
should have the ability to replace anytorch.norm
call that computes a vector norm across all dimensions of an n-dimensional tensor. Therefore, it should support all the arguments thattorch.norm
has, except for thedim
argument.Initial discussion is here: https://github.com/pytorch/pytorch/pull/50055#discussion_r553004871
Alternatives
torch.linalg.vector_norm
could potentially offer adim
argument which would specify which dimensions to compute the vector norm across, instead of computing across all dimensions. In order to replace alltorch.norm
calls with eithertorch.linalg.norm
ortorch.linalg.vector_norm
, we would not needtorch.linalg.vector_norm
to offer adim
argument, becausetorch.norm
only computes a vector norm across multiple dimensions if thedim
argument is None. Therefore, it probably makes the most sense fortorch.linalg.vector_norm
to not offer thedim
argument for now. However, we could add it in a future PR.cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk