moble / quaternion

Add built-in support for quaternions to numpy
MIT License
610 stars 85 forks source link

averaging rotations, why so many possibilities ? #163

Closed romainVala closed 3 years ago

romainVala commented 3 years ago

Hello, my apologies for this issue that is neither a bug nor a feature request. I just would like to better understand how to correctly compute the average of n rotation.

Reading a bit from publication I found on this topic, it seems my maths level is not enough to fully understand why there are so many ways to compute a mean of rotation.

I first to try to define what I expect for an average of rotation. let's consider 2 rotation R1 and R1 that move n points P0 to P1 and P2. The average of those 2 rotation should be a rotation Rmean that moves the points P0 to Pmean, where Pmean is the mean position between P1 and P2.. in the simple case where the 2 rotations are rotations around the same axis with an angle a1 and a2, then the mean rotation is rotation of angle = mean(a1,a2)

Now can I do it for any n rotations defines from euler angles on all three axes.

So here I will compare different way of computing the mean, from euler angle, from rotation matrix, or from quaternion. then I convert the resulting mean into euler angle, to see how different are those means.

euler_mean : it is simply the mean of each euler angles.

euler_exp : taken from this article "Linear Combination of Transformations" the average of a rotation matrix should be done with matrix log and exp

import scipy.linalg as scl
import numpy as np
import quaternion as nq

def exp_mean_affine(aff_list, weights=None):
    if weights is None:
        weights = np.ones(len(aff_list))
    #normalize weigths
    weights = weights / np.sum(weights)

    Aff_mean = np.zeros((4, 4))
    for aff, w in zip(aff_list, weights):
        Aff_mean = Aff_mean + w*scl.logm(aff)
    Aff_mean = scl.expm(Aff_mean)
    return Aff_mean

euler_polar : As you suggest here is this issue #120 one can simply average the rotation matrix, and then take the polar decomposition

def polar_mean_affin(aff_lis, weights=None):
    if weights is None:
        weights = np.ones(len(aff_list))
    #normalize weigths
    weights = weights / np.sum(weights)
    Aff_Euc_mean = np.zeros((3, 3))
    for aff, w in zip(aff_list, weights):
        Aff_Euc_mean = Aff_Euc_mean + w * aff[:3, :3]

    Aff_mean = np.eye(4)
    Aff_mean[:3,:3] = scl.polar(Aff_Euc_mean)[0]
    return Aff_mean

It is well know that quaternion is the way to go, to get proper interpolation between rotation (or dual quaternion for the general case of rigid motion). For instance Skinning with Dual Quaternions demonstrate the better interpolation with dual quaternion compare to the exponential matrix propose by Alexa I then conclude that it may also be important for this topic. and this is the reason I start looking at quaternion

euler_qr_slerp : use the slerp interpolation to compute the mean. The trick here is to decompose a mean over n element to successive addition of 2 number with changing the weight. (and the weighted mean of 2 number is then compute with slerp)

def qr_slerp_mean(qr_list):
    c_num, c_deno = 1, 2
    for ii, qr in enumerate(qr_list):
        if ii==0:
            res_mean = qr
        else:
            t = 1 - c_num/c_deno
            res_mean = nq.slerp(res_mean, qr, 0, 1, t) 
            c_num+=1; c_deno+=1
    return res_mean

Just to check the method is correct, here is the same fonction, but for real number, this is indeed equivalent to the standard mean function


def my_mean(x_list):
    #just decompose the mean as a recursiv interpolation between 2 number, (to be extend to slerp interpolation)
    c_num, c_deno = 1, 2
    for ii, x in enumerate(x_list):
        if ii==0:
            res_mean = x
        else:
            t = 1 - c_num/c_deno
            res_mean = np.interp(t,[0,1],[res_mean,x])
            #res_mean = (res_mean * c_num + x) / c_deno
            c_num+=1; c_deno+=1
    return res_mean
x = np.random.random(100)
np.mean(x)-my_mean(x)

euler_choral : an other way is to use your function mean_rotor_in_chordal_metric() (although I am not sure it is appropriate, since you told me here

that's referring to a different type of mean

how different ? is what I trying to evaluate here, but which one to choose ? is not clear at all to me ...

Here is the code to compare those 5 ways to compute a mean, I compute 50 times a means of 10 random rotation define from random euler angle:

import quaternion as nq
def get_affine_rot_from_euler(e_array):
    aff_list=[]
    for r in e_array:
        r = np.deg2rad(r)
        qr = nq.from_euler_angles(r)
        aff = np.eye(4, 4);
        aff[:3,:3] = nq.as_rotation_matrix(qr)
        aff_list.append(aff)
    return aff_list
def get_modulus_euler_in_degree(euler):
    euler = np.rad2deg(euler)
    for index, e in enumerate(euler):
        if e < -180:
            euler[index] = e + 360
        if e > 180:
            euler[index] = e -360
    return euler

def get_euler_from_qr(qr):
    return get_modulus_euler_in_degree(nq.as_euler_angles(qr))
def get_euler_from_affine(aff):
    qr = nq.from_rotation_matrix(aff)
    return get_modulus_euler_in_degree(nq.as_euler_angles(qr))

np.random.seed(4)
nb_mean=50
euler_mean, euler_choral, euler_exp, euler_pol, euler_slerp, euler_qr_slerp = np.zeros((nb_mean,3)), np.zeros((nb_mean,3)), np.zeros((nb_mean,3)), np.zeros((nb_mean,3)), np.zeros((nb_mean,3)), np.zeros((nb_mean,3))
for i in range(nb_mean):
    rot_euler = np.random.uniform(2,10,size=(10, 3)) #in degree
    print(f'min {np.min(rot_euler)} max {np.max(rot_euler)}')
    aff_list = get_affine_rot_from_euler(rot_euler) #4*4 affine matrix
    qr_list = [ nq.from_rotation_matrix(aff) for aff in aff_list]  #unit quaternion

    euler_mean[i,:] = np.mean(rot_euler,axis=0)

    qr_cm =  nq.mean_rotor_in_chordal_metric(qr_list); #print(get_info_from_quat(qr_cm)); print(get_euler_for_qr(qr_cm))
    euler_choral[i,:] = get_euler_from_qr(qr_cm)

    aff_exp_mean = exp_mean_affine(aff_list);     #print(spm_imatrix(aff_exp_mean)[3:6])
    euler_exp[i,:]  = get_euler_from_affine(aff_exp_mean)

    aff_polar_mean = polar_mean_affin(aff_list); #print(spm_imatrix(aff_exp_mean)[3:6])
    euler_pol[i,:] = get_euler_from_affine(aff_polar_mean)

    qr_mean = qr_slerp_mean(qr_list)
    euler_qr_slerp[i,:] == get_euler_from_qr(qr_mean)

print(f'max diff euler between choral and euler mean {np.max(np.abs(euler_choral - euler_mean))}')
print(f'max diff euler between exp matrix and euler mean {np.max(np.abs(euler_exp - euler_mean))}')
print(f'max diff euler between polar matrix and euler mean {np.max(np.abs(euler_pol - euler_mean))}')
#print(f'max diff euler between dq slerp and euler mean {np.max(np.abs(euler_slerp - euler_mean))}')
print(f'max diff euler between slerp and euler mean {np.max(np.abs(euler_qr_slerp - euler_mean))}')

print(f'max diff euler between slerp and choral  {np.max(np.abs(euler_qr_slerp - euler_choral))}')
print(f'max diff euler between slerp and exp  {np.max(np.abs(euler_qr_slerp - euler_exp))}')
print(f'max diff euler between slerp and polar  {np.max(np.abs(euler_qr_slerp - euler_pol))}')

print(f'max diff euler between exp matrix and choral {np.max(np.abs(euler_exp - euler_choral))}')
print(f'max diff euler between exp matrix and polar {np.max(np.abs(euler_exp - euler_pol))}')
print(f'max diff euler between choral and polar {np.max(np.abs(euler_choral - euler_pol))}')

when runing with relative small euler angles ([2 10], I get the following results

max diff euler between choral and euler mean 0.5771007165992952
max diff euler between exp matrix and euler mean 0.5792118663119004
max diff euler between polar matrix and euler mean 0.5770459422450864
max diff euler between slerp and euler mean 0.5773281912799781
max diff euler between slerp and choral  0.0006857539330056284
max diff euler between slerp and exp  0.0038490835044502347
max diff euler between slerp and polar  0.0017496106805383604
max diff euler between exp matrix and choral 0.003745340455438395
max diff euler between exp matrix and polar 0.0034720351873698263
max diff euler between choral and polar 0.0011792367475038645

So the basic mean of euler angle is not that bad ! maximum error found is 0.57 °, and the other methods, are much more similar, with less than 0.001° maximum error

If now i re-run but just changing the range of euler angle to 2 30 (degree)

rot_euler = np.random.uniform(2,30,size=(10, 3)) #in degree then I get much differences compare to basic euler mean

max diff euler between choral and euler mean 3.4405882549275937
max diff euler between exp matrix and euler mean 3.477347445081607
max diff euler between polar matrix and euler mean 3.4494764238804567
max diff euler between slerp and euler mean 3.4314367559995063

and other method are still similar although euler_exp seem to start to sligthly diverge (max diff aroun 0.1)

max diff euler between slerp and choral  0.05032053106625689
max diff euler between slerp and exp  0.1275873270845338
max diff euler between slerp and polar  0.06344169651966247
max diff euler between exp matrix and choral 0.12007111035800833
max diff euler between exp matrix and polar 0.11048402024126425
max diff euler between choral and polar 0.04851378454809563

So here is my 'informatic" way to understand what is going on, but a more theoretical understanding would be much better, ... any clues would be appreciate

Many thanks

moble commented 3 years ago

I just would like to better understand how to correctly compute the average of n rotation.

Remember that there is no single "correct" average. Instead, there are numerous ways — as you have found — to define various things that might roughly correspond to what you intend in a particular situation. The only way to decide if one definition is the best for some particular purpose is to investigate it for that particular purpose. You've discovered this for yourself, too. But I worry that the measure you've defined is not a very good one, just because Euler angles really are terrible.[1] In particular, your measure is not "invariant", which means that your results will change if you just rotate your coordinates. I suggest switching to quaternion.rotation_chordal_distance or quaternion.rotation_intrinsic_distance.

I suggest taking a step back and thinking about the rotation group itself. Unit quaternions, rotation matrices, and Euler angles are all just ways of putting the rotation group on computers, so fundamentally they sort of represent the same thing. In particular, the logarithm and exponential of quaternions are very closely related to the logarithm and exponential of rotation matrices, and — in some ways — to the Euler angles and resulting rotations themselves. But there are lots of ways in which quaternions are the most "faithful" of these, so let's stick with them for now.

The quaternions can be thought of as just a regular four-dimensional space, so the unit quaternions can be thought of as a three-dimensional sphere in that four-dimensional space. It may be a little hard to visualize four dimensions, but luckily you can get most of the intuition you need just by thinking of an ordinary two-dimensional sphere in three dimensions, and the standard spherical coordinates.[2]

So now we can think of a simple but useful analogy. Imagine trying to define the average location of a bunch of points on a sphere. You'll probably want the average to be a point on the sphere that minimizes the average distance to all the other points. But your result will depend on how you define distance between two points.

The first of these depends on how the streets are arranged, so it isn't geometrically meaningful. In the same way, Euler angles depend on the coordinates you chose when defining the Euler angles. In particular, your Euler measure will give a huge (possibly infinite!) amount of weight to points close to the poles, compared to points near the equator. If you had just decided to use different coordinates, that pole would be in a different place, so you would effectively give all your points completely different weights.

The second and third don't depend on coordinates at all, which implies they are both geometrically meaningful, which is why I use them in the code. But they are different. The results are nearly equal for points that are close to each other, but as the points get farther away, the intrinsic distance grows more quickly than the chordal distance. So the intrinsic average will give more weight to points that are farther away from the average. In statistics, we would say that this is more sensitive to "outliers". They're different, but neither is "wrong", so neither is "right".

Now go back to the definition of the average. I suggested that you want to minimize the distances between the average and all the other points. But why not the square of the distances? There are examples of this in the literature. The results again give much more weight to "outliers", but I can't say that they're wrong.

Or maybe you are less confident in your measurement of the locations of some points than others, so why not manually give different weights to different points?

Or maybe it's easier for you to compute things with rotation matrices. Just as with quaternions, you can define different measures of distances between rotation matrices — including intrinsic and chordal measures — that won't be precisely the same as the quaternion measures.

Or maybe you want to talk about "generators" of rotations (which are essentially the logarithms of quaternions or matrices).

Or... I could go on.

So the reason there are so many ways to average rotations is just because there are lots of ways to represent rotations, lots of ways to measure "distances" with each of those representations, and lots of ways (and weights) to combine the distances of various points.[3]

We're still left with the question of which one to use. The "best" answer depends on your precise requirements. But for me, simplicity and speed are important requirements, and I haven't seen any reason to think that any other definition is ever any better than just the simple and fast mean_rotor_in_chordal_metric.


[1] The only time Euler angles are reliable is when you're doing analytical integration over the rotation group. It doesn't sound like you're interested in that, so it really is best to just stay away from them entirely.

[2] There's another complication due to the fact that the unit quaternions actually map projectively onto the rotation group (meaning that q and -q represent the same rotation). But at least if we stick to one hemisphere, this doesn't actually matter, so we can leave that aside for now.

[3] For reference, here are some references that present various ways of computing averages:

romainVala commented 3 years ago

Hi thanks for your detail comment, it really help to better understand.

I understand that taking euler angle difference to compare the different results is not a good idea. Unfortunately I can not stay totally away from them, since it is the common parametrisation that is use in the MRI community (or more generally in processing 3D volumique data). For instance in order to co-register 2 MRI volumes, the output result will be 3 euleur angles and 3 translations. I guess internally they stay with the affine matrix representation and use the euler just as output to store the results (so it may not be a big deal) In my case I will then have to take them as input, but I can start to convert them to quaternion for the different task. As you first pointed quaternion/ matrix / euler represent the same thing, (ie there is a one to one relation between different representation (exept of course for the gimbal problem). so it should be fine ...

Coming back to my first question on averaging, I understand the average is define by the metric choice, and there are different possibility. But I can not make my mind with your statement

Remember that there is no single "correct" average.

In one of my use case, I study effect of motion on MRI acquisition in a simulation framework. So I have a generative model of MRI acquisition and I add a time course of head motion during the long acquisition time. I then compare the original volume to the one obtain with a given motion time course (artefacted volume) .

The problem is that I do observe sometimes, a globale shift of the object (compare to the original volume), which I need to remove. Finding this global shift is equivalent to finding a weighted mean of my motion time course. (to be convince, I take the problem from the end : Suppose one add a constante rotation to the motion time course, the resulting volume, will be the same plus a constante shift induce by this constante rotation ...)

Note that it is not a simple average but a weighted average because MR acquisition is taking place in the fourier space of the object, so this is the position near the k-space center that matter. So the extra difficulty I try to solve is to know which weights need to be taken for the weighted average

So to resume: I have a generative process, and in this case, there a unique ground truth of what will be the resulting global shift induced. It seems logical that it is related to a weighted average, but mathematicians say that there is not a unique way to compute average ... and it is still difficult to admit, since I do observe a unique shift ....

If I do the same but with only translation, then there is a unique (true) average, which is simply the translation vector average ... so why there is not the equivalent for rotations ...

I understand that given the different ways to represent rotation there are different way to define distance (and then the average) but at the end of the day all those different representation refer to the same motion, for which a unique average should exist ... what do I miss ?

connorjak commented 3 years ago

I think your objective is to minimize rotational shifts in the data that could skew results when overlaying datasets. So you're looking for something like a least-squares "center of mass" style average for where the motion moves the rotation, in order to best fit the aberrated data to a reference?

My intuition is that the "intrinsic" distance type most closely represents your problem.

romainVala commented 3 years ago

Yes exactly, I could compute for each object the n different positions and take then the euclidean average of the position, to then find the rotation that bring the object back to the original. (in order to "remove" it from the n rotations, used in the simulation)

But I want to do this for any object, so to be able to compute it directly from the rotations. what do you mean by "center of mass" : of the object or of the rotations ?

Note that the complet problem would be with rotation from different origine plus a translation term. But as far as I understand the dual quaternion representation would easily extend the average rotation. (At least I hope ...)

moble commented 3 years ago

what do I miss ?

This part:

The only way to decide if one definition is the best for some particular purpose is to investigate it for that particular purpose.

You need to step back and think about exactly what you want. Ignore quaternions for the moment, and go back to your data, and think about what you actually want to do with it.

I'm not exactly clear on how your data are represented, but at a basic level I guess the quantity you measure is density (or something similar) as a function of position: ρ(𝒙). It sounds like you have multiple measurements of more-or-less the same object — and therefore roughly the same underlying density field — and you want to line them up with each other as well as possible. Obviously if they were identical objects, you could do this with just a translation and rotation. You can deal with the translation by just computing the center of mass of the density field: ∫ ρ(𝒙) 𝒙 d³𝒙. This is invariant under rotations about the center of mass, so you can just bring the center of mass to — say — the origin of coordinates. You can do that for each measurement you have, and just be done with translations. (If you want to use dual quaternions / conformal geometric algebra to do all of this at once, it is possible to do so, but nothing you've written suggests that's necessary.)

Now, the game is just to rotate about the origin to match your different measurements to each other. There are lots of ways I can imagine doing this. The most obvious (though probably not easiest) way would be to optimize ∫ |ρ₁(𝒙) - ρ₂(𝑹𝒙𝑹̄)|² d³𝒙 over the quaternion 𝑹. Another way would be to measure the second moments of the density fields (the center of mass being the first moment), and then solve for the rotation that takes one moment onto the other. Another way would be to establish a minimal set of reference points that let you define the orientation of the object, and then just solve algebraically for the rotation that takes one onto the other. Another way would be to measure numerous reference points and minimize the average distance between corresponding points Σᵢ|𝒙ⁱ₁ - 𝑹𝒙ⁱ₂𝑹̄|².

My point is that, even in this simplified description, there are again many ways you could approach the problem. So the mathematics can't tell you which one you want to do; you have to decide. If your chosen method leads you back to averaging quaternions, then I bet it will either indicate which type of average you want, or it won't matter which type of average you use. (In the limit that your different measurements really are just rotated and translated, it shouldn't matter which type of average you use.)

romainVala commented 3 years ago

Thank very much for taking time on my particular problem, I really appreciate, and I hope I can continue a little bit without being to annoying .

Pose estimation (and hence relative translation/rotation estimation ) is an other problem. I will have also to look more closely to it, but for now my first problem is not exactly that.

I need to find the average in a context of simulation, which mean that I choose and set the different rotations that the object will have. In other words, I am looking at a generative process, that will generate n rotations with a zero average.

So taking your last statement all different means should give me the same average. (since in the simulation context I am pretty sure that the different transform are really just rotations and translations).

Consider the coding example I put, I though it resume exactly what I am looking at: Take n rotations and compute there average ... does it make sense ? The fact I found different results may be due to just rounding errors in the different computations. Propagation of the error, since I used as generator the euler angle description which may for some random case induce much larger errors ...

About the different distance metrics, I founds those notes which make sense in relation to what we discuss here. My understanding is that ok, those metrics are different, but they will have the exact same minimum. So for an optimization problem (or the task to find the average), it should led to the same no ?

PS : By the way, do I understand correctly that your Chordal distance is the L2 norm of the quaternion difference ? why to chose an other name ?