pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.37k stars 22.48k forks source link

Add torch.linalg.vector_norm and torch.linalg.matrix_norm functions #50214

Closed kurtamohler closed 3 years ago

kurtamohler commented 3 years ago

Update!

From offline discussion with @kurtamohler and @ngimel, for the 1.9 release we plan to:

In the feature we may even want to consider warning people away from torch.linalg.norm so they use one of these two functions, instead. From our discussions with other engineers, it seems that most uses cases are either of vector norms or matrix norms and not a mix of the two, and their conflation in torch.norm and torch.linalg.norm may now be familiar, but is still often confusing for users.

🚀 Feature

Add a function called torch.linalg.vector_norm, which computes the vector norm of an input with more than one dimension.

Motivation

torch.norm has the ability to compute vector norms of n-dimensional inputs, but torch.linalg.norm does not. In order to deprecate torch.norm (issue #49907), we need to offer the missing functionality. Originally, the idea was to add a flag to torch.linalg.norm that offers n-dimensional vector norm support, but that idea was determined to be bad UX design, since the new argument would be incompatible with the behavior of the dim argument's behavior in torch.linalg.norm. Instead, it makes more sense to implement a separate torch.linalg.vector_norm function.

Pitch

torch.linalg.vector_norm should have the ability to replace any torch.norm call that computes a vector norm across all dimensions of an n-dimensional tensor. Therefore, it should support all the arguments that torch.norm has, except for the dim argument.

Initial discussion is here: https://github.com/pytorch/pytorch/pull/50055#discussion_r553004871

Alternatives

torch.linalg.vector_norm could potentially offer a dim argument which would specify which dimensions to compute the vector norm across, instead of computing across all dimensions. In order to replace all torch.norm calls with either torch.linalg.norm or torch.linalg.vector_norm, we would not need torch.linalg.vector_norm to offer a dim argument, because torch.norm only computes a vector norm across multiple dimensions if the dim argument is None. Therefore, it probably makes the most sense for torch.linalg.vector_norm to not offer the dim argument for now. However, we could add it in a future PR.

cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk

kurtamohler commented 3 years ago

cc @mruberry @ngimel

mruberry commented 3 years ago

I like the idea of supporting a dims argument, but let me ask a more general question:

Currently torch.norm and torch.linalg.norm support a variety of norms computed with radically different methods. They almost seem like "clearing houses" or "routers" for picking a norm. Would it be simpler if we implemented the vector norms and the matrix norms that comprise torch.norm and torch.linalg.norm as separate functions, and then torch.linalg.norm and torch.linalg.vector_norm can select among these functions?

kurtamohler commented 3 years ago

@mruberry, your suggestion makes sense. torch.linalg.norm could just simply route directly to at::linalg_vector_norm (the at:: implementation of torch.linalg.vector_norm) when its vector norm conditions are met. This would require at::linalg_vector_norm to offer a dim argument, since torch.linalg.norm offers the ability to calculate vector norm across just one dimension of an n-dimensional input.

I'm also now thinking that torch.linalg's norm methods should probably not rely on at::norm internally (although I think relying on norm_stub is still alright, since that would be a lot of code to duplicate). That way, it would be simpler to change at::norm without affecting at::linalg_vector_norm, and vice versa. Futhermore, when we replace torch.norm with torch.linalg.norm or torch.linalg.vector_norm, we could also think about also replacing at::norm calls with a at::linalg_vector_norm call, and then delete at::norm.

Related: torch.norm and torch.linalg.norm give different results for some degenerate tensors and extreme values. I will need to find out if any internal calls rely on torch.norm's behavior in these edge cases. If they do, it's most likely not a good idea to simply replace them, we'd probably need a deprecation strategy.

mruberry commented 3 years ago

It's true that linalg.norm could route to linalg.vector_norm when it was computing a vector norm. But I was thinking even more fine-grained like: _matrix_two_norm, _matrix_neg_two_norm, _vector_zero_norm ... These functions don't need to be exposed to users. This might be more readable.

ngimel commented 3 years ago

All vector norms and frobenius norm are implemented in the same way, via reduction. For matrix norm indeed splitting into separate functions might make sense, but even with that I'd leave +/-inf and +/-one matrix norm in the same function, they share a lot of implementation. That leaves nuclear norm and +/-2-norms that could also live together because they both rely on first computing svd. Question: why is frobenius norm implemented as

    if (self.is_complex()){
      result_ = at::sqrt(at::sum(at::real(self.conj() * self), dim_, keepdim));
    } else {
      result_ = at::sqrt(at::sum((self * self), dim_, keepdim));
    }

? This is super inefficient compared to vector norm, and produces the same result.

kurtamohler commented 3 years ago

Got it, that makes sense.

Although I'm not too sure if we should separate the vector norm orders into _vector_zero_norm and so on, because that would make it necessary to duplicate the logic for selecting the appropriate vector norm order in both at::linalg_norm and at::linalg_vector_norm. Would it make more sense to just have a _vector_norm to encapsulate this logic?

EDIT: I posted the above paragraph before my page reloaded with Natalia's comment, which mentions this more concisely.

Also, if I understand correctly, if we use this fine-grained approach, at::linalg_norm would need its own specialized entry in derivatives.yaml, in addition to the derivative formula for at::linalg_vector_norm. If, instead, at::linalg_norm called at::linalg_vector_norm, the derivative for at::linalg_norm in the vector case would be automatically traced to that of at::linalg_vector_norm. The fine-grained approach is alright with me if the pros outweigh the cons, I'm just pointing out one of the possible cons.

kurtamohler commented 3 years ago

@ngimel, you're right, frobenius norm is the same as a vector 2-norm, so it should share the same implementation as vector norm.

mruberry commented 3 years ago

I guess the way I'm thinking of it torch.linalg.norm would either call torch.linalg.vector_norm or a matrix norm impl, and the latter things would have derivatives, so torch.linalg.norm would be a composite operation and not need a derivatives.yaml entry. We can VC through the details sometime, but I think we're all happy to proceed with torch.linalg.vector_norm.

vadimkantorov commented 3 years ago

Would there be a separate norm functions for matrix_norm?