Sparse KNN leaks accuracy in strong scaling runs.

wlruys commented 3 years ago

Accuracy drops dramatically on sparse scaling runs past the first few iterations.

Checking the partition on small (N<20) on sparse matrices appears to be correct. Randomization in distributed k-select only hits O(100) per iteration points in URL/AVAZU, so this shouldn't be the cause.

yiorgb commented 3 years ago

Is this both on CPU and GPU code? Is it related to the number of GPUs? If yes, it is a bug.

wlruys commented 3 years ago

I ran the old code to see if it was also leaking accuracy. Main conclusion at the moment is that its hard to tell in the first few iterations. But its possible we're seeing the same behavior.

I'm running more trials of the new code at the moment but the queue is being slow tonight.

This old version of the code has the larger magnitude random perturbation in the distributed_median. I ran the old code exactly as it was in commit: f19add0d63e9c9a3d6e083a8207455362e9f8fad (with a new compilation fix from the current code)

^ This is with the URL dataset (2 local tree iterations per Iteration, 1170 ppL)

yiorgb commented 3 years ago

Can you compare with the final partition that Chao builds? You don't need all nodes. Just look at a couple of leaves. Just make sure that you have the same tree. Also, you can test it on a random dense dataset which you just pad a few extra dimensions with zeros; so that you can ensure that the projections are not nearly zero and that all the points are different.

yiorgb commented 3 years ago

Is it possible that the bug is on the neighbor updates and not on the tree construction / distance calculation?

wlruys commented 3 years ago

Is it possible that the bug is on the neighbor updates and not on the tree construction / distance calculation?

It's possible, I changed some of that code as well. I would expect to see duplicate points showing up (multiple points at zero distance, etc) if there was a merging error though and I don't think I'm seeing this in my output.

Can you compare with the final partition that Chao builds?

I can try. I'll work on it soon. Trying to verify dense codes (OLD vs NEW Pyrknn) against each other at the moment, which will help narrow down where the error could be. (Possibly in the common portion) It'll be slightly tricky to compare leaves w/ Chao since they'll be in random order internally.

Also, you can test it on a random dense dataset

I'm trying full dense now as a sanity check. If this fails, this will also kind of test the same thing.

yiorgb commented 3 years ago

Okay, test the dense code. (But I thought the dense code was broken... ) anyway, working with dense codes will be much better to debug the tree construction.

wlruys commented 3 years ago

The chart above is incorrect. I was only comparing accuracy of IDs in that old commit. For sparse data, with many points at the same distance for each other this is inaccurate. Below shows the correction (a point is correct if its closer than the true kth nearest neighbor).

For comparison here is what aims to be the same runs on the new distribute code.

I'm not sure why 1 rank is showing a difference at the moment, the code should be identical (same dataset, leafsize).

Both are showing decay, but the decay in the new code is much more extreme. May be more than 1 problem.

wlruys commented 3 years ago

Okay, test the dense code. (But I thought the dense code was broken... ) anyway, working with dense codes will be much better to debug the tree construction.

Dense CPU is broken due to removing Parla and not getting around to replacing it yet. Dense GPU is "working".

yiorgb commented 3 years ago

the fact that the 1 rank works fine is not surprising at all. Any bugs that have to do with distributed memory are hidden.

wlruys commented 3 years ago

I'm surprised that they're different by 10%, they should be closer.

wlruys commented 3 years ago

Argh, well, part of this bug is that theres an oversight in the new code. I incorrectly rounded the local tree levels when the points aren't a power of 2. Setting leafsize instead fixes this (also rounding down).

This does not fix all of the problems (dense power of 2 still shows some decay, and there is decay in the old code) but it will make the sparse results for the new code look a lot better.

I'll update the table when the full test finishes running. Hopefully, its close to the old code performance.

wlruys commented 3 years ago

I'll update the table when the full test finishes running. Hopefully, its close to the old code performance.

As expected its looking a bit better but still showing an accuracy leak. Plan:

I'm going to compare the dense trees in the old and new code directly since it still seems to be performing better (and then with Chaos code if there aren't differences)

yiorgb commented 3 years ago

Is this sparse or dense?

wlruys commented 3 years ago

This is sparse (URL), a dense table is on the google drive but not copied here.

wlruys commented 3 years ago

Fixed. It was a combination of a few different things. Tightening the precision of what I consider identical points in the k-select to be closer to machine epsilon, the leafsize bug mentioned above, the diminishing returns of performing multiple local iterations during a strong scaling run, and critically there was a bug in checking the accuracy of nearest neighbors. This bug was not always there but ,checking the repo, has been around for at least a few months.

Not closing this yet just in case I've missed something again. I'm running AVAZU jobs now to have another real dataset to verify this with. But checking the trees directly seemed to show good behavior for synthetic examples.

The bad news is that I don't know which version of the accuracy checking was run for the NMSlib and FAISS comparisons we've shown before. These need to be rerun.

wlruys commented 3 years ago

Times are slow relative to the single node c++ call for Sparse CPU search (Chao's code) shown because these are the GPU kernel. At each iteration data is moved to and from the GPU and the search within the kernel is about twice as slow here compared to the CPU kernel.

wlruys commented 3 years ago

AVAZU maintains accuracy on the first 4 iterations out to 16 ranks on both CPU and GPU. On the order of [0.16, 0.20, 0.23, 0.26, 0.3]. I'm closing this.

A previous AVAZU comparison was accidentally run with the wrong dataset, AVAZU 40M vs AVAZU+Test Points 42M. These have very different convergence properties, AVAZU+Test is significantly faster. I'm rerunning some old comparisons to see if we still have speedup w.r.t. NMSlib. AVAZU w/ test converges like [0.23, 0.33, 0.36, 0.4, 0.43] for this leafsize. (~600 ppL)

ut-padas / pyrknn

Sparse KNN leaks accuracy in strong scaling runs. #2