Confusing PORSAC sampling strategy?

ziruiw-dev commented 7 years ago

Hi Chris,

I found there is probably a mistake in how you implemented the PROSAC sampling algorithm.

According to the algorithm 1 in the original PROSAC paper[Chum 2005], the if/else statements are:

if Tn' < t, sample m-1 points from U_n-1 and sample one point u_n

else, sample m points from U_n.

But what you write in the prosac_sampler.h is exactly opposite:

if (t_n_prime < kth_sample_number_) {
        // Randomly sample m data points from the top n data points.
        std::vector<int> random_numbers;
        for (int i = 0; i < m; i++) {
            // Generate a random number that has not already been used.
            int rand_number;
            while (
                   std::find(random_numbers.begin(),
                             random_numbers.end(),
                             (rand_number = rand() % n)) != random_numbers.end()
                   ) {}

            random_numbers.push_back(rand_number);

            // Push the *unique* random index back.
            subset.push_back(sortedMatches[rand_number].queryIdx);
        }
    } else {
        std::vector<int> random_numbers;
        // Randomly sample m-1 data points from the top n-1 data points.
        for (int i = 0; i < m - 1; i++) {
            // Generate a random number that has not already been used.
            int rand_number;
            while (
                   std::find(random_numbers.begin(),
                             random_numbers.end(),
                             (rand_number = rand_number = rand() % (n - 1))) != random_numbers.end()
                   ) {}
            random_numbers.push_back(rand_number);

            // Push the *unique* random index back.
            subset.push_back(sortedMatches[rand_number].queryIdx);
        }
        // Make the last point from the nth position.
        subset.push_back(sortedMatches[n].queryIdx);
    }

Hope you don't mind me raise this issue, and is this a bug in your code or a wrong algorithm in the paper?

Best, Ryan

sweeneychris commented 7 years ago

Hi Ryan, I'd need to take a look back through the paper to verify but I seem to remember a discussion with @vfragoso where we decided that the prosac paper had a typo and that this was the correct way to do it.

@vfragoso can you verify the correct-ness of this code? It might be that I am misremembering.

ziruiw-dev commented 7 years ago

hi Chris, Thanks for your prompt reply! Yeah I think it's a quite simple if/else statement so it seems like you did it on purpose. Look forward to an accurate answer from vfragoso! --Best, Ryan

vfragoso commented 7 years ago

@drinking-tea I quickly checked the algorithm in the paper. I think the typo is here "if Tn' < t". I understand Tn' as an upper bound for t, since t is the sample counter. Thus, Tn' cannot be less than t. I think the paper meant to say that while t does not reach the number of samples marked by Tn', then sample m points from the subset. When t reaches Tn', then sample m-1 data points from the top n-1 data points, and use the last point in the n-th position.

ziruiw-dev commented 7 years ago

Hi Victor and Chris, @sweeneychris @vfragoso

Sorry for the delay and thanks for the explanation. I just have a thoroughly look and few experiments, and I think the paper is the correct one.

First, one trap is, the T_N will be updated by the 'maximality' stop criterion (Eq. 12), which means the T_N at the first time sample was 200 000, and next time the stop criterion algorithm will estimate the number of max iteration we need based on confidence, then the T_N will normally be decreased a lot. For example, T_N drops from 200 000 to 50 000.

Second, let's assume T_N will not be updated by the stop criterion and the PROSAC will still work perfectly, and just waste many useless loops even when we confident that we have already found the right answer. If you look at the equation two lines above eq 3, you will find the T_n is only depended on n, because T_N and m are constant. If you draw(or imagine) T_n, you will find it's monotonical increasing, BUT increasing SLOWER and SLOWER when n is large. The shape of T_n is very similar to log(n). Meanwhile, the t is growing linearly, so the T_n is not the upper bound for t.

Consider an application case for Homography calculation: we need 4 points (m = 4) in for each sample set, we set the maximum iteration (T_N to 200 000), and we have 80 pairs(2d-2d) of point in total, including inliers and outliers.

During the first sampling, the Tn(start from Tm = 4) will be around 0.1, Tn' = Tm' = 1. In this case, the condition (Tn' >= t (t=0 for the first time sampling)， Eq. 5）has already met, we don't want to increase n, so n = 4, we sample once from U_4. --Eq.5 meet, n = 4, sampling from U_4

The 2nd sampling, nothing changed but t = 1 for the second sampling. And the Eq. 5 Tn' = 1 >= t = 1 still fulfilled, n is not growing. --Eq.5 meet, n = 4, keep sampling from U_4

The 3rd sampling, t = 2, Tn' = 1 < t = 2, the Eq.5 is NOT met, we want to grow n, until T_n' >= t = 2. Fortunately, we only need to let n = 5, because T_5' = 2 will let Eq.5 be met. So now n grows to 5. --Eq.5 not meet, n increase from 4 to 5, sample 3 points from U_4, the rest one will be the u_5.

So, if you look at my "--" statement above, you will find the following rules, which are exactly same to the algorithm 1 in the Chum's paper: If Tn' >= t, don't increase n and sample m points from U_n if Tn' < t, increase n until Tn' >= t, then sample (m-1) points from U_n-1 (the n has been increased, not previous n that not meet Eq.5), and use u_n(a point, not a set) as the last sample point.

Hope I explained this clearly....

Best, Ryan

sweeneychris commented 7 years ago

Hi Ryan,

Thanks for such a detailed analysis! Do you think you could do a real data experiment with this to concretely prove that the correct strategy results in (presumably) fewer iterations? It should be really easy to set this up if you use one of the geometric estimators that call RANSAC methods under the hood: https://github.com/sweeneychris/TheiaSfM/tree/master/src/theia/sfm/estimators

vfragoso commented 7 years ago

Hi Ryan,

I have one concern with your explanation. According to the algorithm 1 in Page 4, the first statement in Step 1 is to increase t. In your explanation above, you are not increasing and considering t=0 for the first iteration. If we follow that algorithm strictly, then in the first iteration t = 1.

I quickly scripted this in Python to understand this. This is what the plot shows:

If T_N remains constant, then t_n_prime is larger than t, which makes t_n_prime >= t -- see the attached plot.

constant_t_n

If T_N is interpreted as the max_number of iterations, then t_n_prime can be less than t. The question is, if T_N is intended to be the max_number of iterations and if Theia uses T_N as such.

decrease_t_n

ziruiw-dev commented 7 years ago

Hi Chris, I am doing a project that applying PROSAC on solvePnP function.

I tried both if/else statements in different order quickly in my project(because the only difference is the sampling order), the algorithm in both orders quit at the 1st iteration with a correct-ish answer in most cases (90%).... so the performance is almost same in my case. I will try the Theia a bit later because I haven't installed it haha...

sweeneychris commented 7 years ago

@drinking-tea any updates on this?

ziruiw-dev commented 7 years ago

still working on the PROSAC implementation, but not this bit currently....

I drew the first graph provided by vfragoso by myself but still not sure. I checked another repo of prosac implementation, which is part of opencv 3.2(rho.cpp if you interested), and they have a comment in the opencv 3.2 prosac implementation following:

inline void   RHO_HEST_REFC::getPROSACSample(void){
    if(ctrl.i > ctrl.phEndI){
        /* FIXME: Dubious. Review. */
        rndSmpl(4, ctrl.smpl, ctrl.phNum);/* Used to be phMax */
    }else{
        rndSmpl(3, ctrl.smpl, ctrl.phNum-1);
        ctrl.smpl[3] = ctrl.phNum-1;
    }
}

I guess I will come back a bit later?...

sweeneychris commented 7 years ago

@drinking-tea I'm going to close this issue for now due to the ambiguity. feel free to reopen it if we see more evidence that the strategy is wrong.

sweeneychris / TheiaSfM

Confusing PORSAC sampling strategy? #144