scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.95k stars 5.15k forks source link

BUG: Rotation.align_vectors documentation does not match the code implementation #15907

Closed alexander-soare closed 2 years ago

alexander-soare commented 2 years ago

Describe your issue.

The description of RMSD here is not matched by the code implementation here. The latter needs to be divided by np.sqrt(len(a)) to match the documentation.

The above is a full description of the problem, but you may follow the discussion in this Stack Overflow question to learn more.

SciPy/NumPy/Python version information

1.7.2 1.20.3 sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)

WarrenWeckesser commented 2 years ago

The documentation of the return value of align_vectors says

rmsd : float
    Root mean square distance (weighted) between the given set of
    vectors after alignment. It is equal to ``sqrt(2 * minimum_loss)``,
    where ``minimum_loss`` is the loss function evaluated for the
    found optimal rotation.

The formula shown in the description matches the value that is returned, so I think the only problem is the use of the word "mean". sqrt(2 * minimum_loss) is not the root-mean-square-distance, it is the root-sum-square-distance. At a minimum, that description should be changed to something like

rssd : float
    Square root of the sum of the squared distances (weighted) between
    the given set of vectors after alignment. It is equal to
    ``sqrt(2 * minimum_loss)``, where ``minimum_loss`` is the loss
    function evaluated for the found optimal rotation.

The alternative is to modify the code to return the actual RMSD (and update the docstring accordingly), but that is a backwards incompatible change.

alexander-soare commented 2 years ago

@WarrenWeckesser yes makes sense. When I originally posted the Stack Overflow question I was assuming that the weights would be 1/num_point_pairs if not specified, but with the weights defaulting to 1 it is indeed the sum. May I suggest a more straight forward wording

rssd : float
    Square root of the weighted sum of the squared distances between
    the given set of vectors after alignment. It is equal to
    ``sqrt(2 * minimum_loss)``, where ``minimum_loss`` is the loss
    function evaluated for the found optimal rotation.
WarrenWeckesser commented 2 years ago

That looks fine to me, thanks. Are you interested in submitting a pull request with the updated documentation?

You're right about the weights: if the default weights were the constant value 1/n instead of 1, then the returned value rmsd would be the true RMSD. In general, one could imagine an implementation in which the given weights were always normalized such that the sum of the weights is 1, and the default weight of 1/n is just a special case of that requirement.

alexander-soare commented 2 years ago

Are you interested in submitting a pull request with the updated documentation?

Will do shortly :)

pmla commented 2 years ago

I should have caught this problem in review back when we developed this method.

It was actually the intention that the RMSD (according to the usual definition) be returned. I am still in favour of this, although it is an awkward situation we have created. It would be great if we could get back to the RMSD via a deprecation cycle, but I can't immediately think of an elegant solution.

alexander-soare commented 2 years ago

I agree that RMSD makes more sense. Can you help me understand something about deprecation cycles? What I would do here is put a deprecation warning for the next version then the version after that switch it out and keep a warning there for one more version. (It's not really "deprecation" though, as we'd be changing something not removing it). But still i see no harm. The way I work with my projects that I actually care about is I pip freeze with version numbers. Isn't that just good practice? Are popular libraries like scipy really doomed to keep every nick and scratch they ever accumulate for fear of bc breaking changes (even on non-core functionality). Asking for purely educational purposes

On Fri, 1 Apr 2022, 21:42 Peter Larsen, @.***> wrote:

I should have caught this problem in review back when we developed this method.

It was actually the intention that the RMSD (according to the usual definition) be returned. I am still in favour of this, although it is an awkward situation we have created. It would be great if we could get back to the RMSD via a deprecation cycle, but I can't immediately think of an elegant solution.

— Reply to this email directly, view it on GitHub https://github.com/scipy/scipy/issues/15907#issuecomment-1086306232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6G5FNZGBVC3CC4QZI7BJDVC5NU3ANCNFSM5SFHPYDA . You are receiving this because you authored the thread.Message ID: @.***>

rgommers commented 2 years ago

I am still in favour of this, although it is an awkward situation we have created. It would be great if we could get back to the RMSD via a deprecation cycle, but I can't immediately think of an elegant solution.

That is a little tricky indeed. We cannot change the numerical result that is now returned as the second return value, even with a deprecation cycle.

One thing that could be considered is returning both. This can be done via a Bunch class, similar to how we use that in other places where tuple unpacking needs to remain backwards-compatible. Although the length of the returned tuple depending on return_sensitivity makes that a little more complicated here (but still doable if really desired).

@pmla do you think it's worth opening a new issue for that?

The way I work with my projects that I actually care about is I pip freeze with version numbers. Isn't that just good practice?

It's not, you cannot rely on such a thing with widely used code.

Are popular libraries like scipy really doomed to keep every nick and scratch they ever accumulate for fear of bc breaking changes (even on non-core functionality). Asking for purely educational purposes

I think https://numpy.org/neps/nep-0023-backwards-compatibility.html will answer your questions. We can deprecate things, but not if they silently change the type of answer that is given. At that point, deprecation the function/method completely and replacing it with a new one is better.

WarrenWeckesser commented 2 years ago

There is no deprecation required for fixing a bug where the implementation is obviously incorrect and not the documented behavior. If we are flexible in our definition of "bug", I suppose we could accept the given documentation that says "Root mean square distance (weighted) between the given set of vectors after alignment" as the intended behavior, and make the change to the desired behavior as a bug fix, with no deprecation (but certainly with a release note about the change). However, the documentation also gives the formula for what is implemented, and that formula matches the current behavior, so the argument that it would be just a bug fix is weak. After looking at it again now, I would be OK with the "bug fix" argument, but I understand if others find it unacceptable. The use of a Bunch would also be fine. As @rgommers noted, that approach is made slightly more complicated by the return_sensitivity parameter, but I think that just means there are two possible Bunch types that can be returned.

rgommers commented 2 years ago

Anyone who uses that return value will have noticed this probably, and is likely to either use it as is or correct for the problem by dividing by 1/n. So it seems unhealthy to consider this a bug fix. Better to rename the method then.