lferry007 / LargeVis

Apache License 2.0
706 stars 167 forks source link

Potential bug in LargeVis::search_reverse_thread #9

Open Quasimondo opened 8 years ago

Quasimondo commented 8 years ago

I believe there is a small bug in the search_reverse_thread call - I am not sure what the correct fix is, but the current version does not really make sense to me:

`void LargeVis::search_reverse_thread(int id)
{
    long long lo = id * n_vertices / n_threads;
    long long hi = (id + 1) * n_vertices / n_threads;
    long long x, y, p, q;
    for (x = lo; x < hi; ++x)
    {
        for (p = head[x]; p >= 0; p = next[p])
        {
            y = edge_to[p];
            for (q = head[x]; q >= 0; q = next[q])
            {
                if (edge_to[q] == x) break;
            }
            reverse[p] = q;
        }
    }
}`

In the inner loop y gets assigned the index of a connected node, but then y is not being used at all. Since it's supposed to be a backwards search my guess is that the fix would be to replace head[x] with head[y] in the inner loop - so there is a search back from the other edges neighbors until the two paths meet.

`void LargeVis::search_reverse_thread(int id)
{
    long long lo = id * n_vertices / n_threads;
    long long hi = (id + 1) * n_vertices / n_threads;
    long long x, y, p, q;
    for (x = lo; x < hi; ++x)
    {
        for (p = head[x]; p >= 0; p = next[p])
        {
            y = edge_to[p];
            for (q = head[y]; q >= 0; q = next[q])
            {
                if (edge_to[q] == x) break;
            }
            reverse[p] = q;
        }
    }
}`
DataWaveAnalytics commented 8 years ago

Just curiosity...Did you run any test using this change?

sonjageorgievska commented 8 years ago

Hi, I also had to make this change, otherwise it was crashing. I ran it on the CondMat example and it produced acceptable results.

sonjageorgievska commented 8 years ago

I have to mention that before fixing this bug, I had to correct the sample_an_edge method to something like this:

long long LargeVis::sample_an_edge(real rand_value1, real rand_value2)
{
    long long k = (long long)(n_edge * rand_value1) - 1;
    k = max(k, 0);
    return rand_value2 <= prob[k] ? k : alias[k];
}

Without this correction, I had "index out of range" when rand_value1 would be 1 or when rand_value2 would be one. This is a quick fix, though, but worked. (Potentially another issue)

DataWaveAnalytics commented 8 years ago

That last change shouldn't be necessary because according to GSL's documentation you shouldn't get a 1 running _gsl_rnguniform.

_Function: double gsl_rng_uniform (const gslrng * r) _This function returns a double precision floating point number uniformly distributed in the range [0,1). The range includes 0.0 but excludes 1.0. The value is typically obtained by dividing the result of gsl_rng_get(r) by gsl_rngmax(r) + 1.0 in double precision. Some generators compute this ratio internally so that they can provide floating point numbers with more than 32 bits of randomness (the maximum number of bits that can be portably represented in a single unsigned long int).

Which dataset were you using when crashed?

I removed the GSL dependency in my code. I'm using the stl uniform_real_distribution to improve portability. I did tests with this change and I got acceptable results as well.

Next, I did a test using MNIST dataset with the change proposed here, and I get different results (I got more clusters (13) than the typical ten you can find around). So, I think it deserves more debugging.

sonjageorgievska commented 8 years ago

I used the CondMath_network dataset.

I agree that this deserves more debugging or a separate issue.

DataWaveAnalytics commented 8 years ago

I guess the bug is in the construction of prob or alias vectors. I'm contacted the original authors and Jian (one of the authors) confirmed this his original source code. Probably, they would give us more insights to solve them.

DataWaveAnalytics commented 8 years ago

After doing some research, it looks like there is a rounding problem between double and float (real). The details are explained here.

I changed the sample_an _edge method using the solution proposed in the post. Note you should also change the method declaration in LargeVis.h due the new type in the parameters.

long long LargeVis::sample_an_edge(double rand_value1, double rand_value2){
    real rv1 = rand_value1;
    real rv2 = rand_value2;

    if(rv1 > rand_value1)
        rv1 = std::nextafter(rv1, -std::numeric_limits<real>::infinity());

    if(rv2 > rand_value2)
        rv2 = std::nextafter(rv2, -std::numeric_limits<real>::infinity());

    long long k = (long long)(n_edge * rv1);
    return rv2 < prob[k] ? k : alias[k];
}
sonjageorgievska commented 8 years ago

Update: I tried this solution, but at progress of 2.84% I got again the "index out of range" crash. So, for the moment I am still using the fix that I wrote about earlier...

DataWaveAnalytics commented 8 years ago

I've running several tests without getting any segmentation fault. Please, could run tests using gdb and post the output here?.

gdb --args ./largevis -input ... then write run and press Enter.