Overlap - Githubissues

jlmelville / rnndescent

R package implementing the Nearest Neighbor Descent method for approximate nearest neighbors

https://jlmelville.github.io/rnndescent/

GNU General Public License v3.0

10 stars 2 forks source link

Overlap #5

Closed vspinu closed 2 years ago

vspinu commented 3 years ago

POC. Needs tests.

vspinu commented 3 years ago

This doesn't seem to be in the official definition of the Hamming distance though (and also adds a division). I don't see a compelling reason to add the normalization, do you agree?

Yes. I agree. Normalization is super easy outside of the routine and I would rather have it as part of more generic #3 weighting feature.

Should this version become the implementation for "hamming"? Maybe the binary-only version can be renamed to "bhamming".

Yes, that would make more sense to me. The package is still on CRAN and I guess not widely used as yet, so breaking should be fine.

What kind of data is this intended for? Integers?

Yes. I didn't want to add a separate entry point for integers but I think it should be done to avoid the implicit Rcpp conversion.

Do you have any sample data with expected input and output for testing?

No, but I will create once we agree on the final.

Do you want to add yourself as a contributor in the Authors@R section of the DESCRIPTION?

Thanks, but I think this small change is not worthy of authorship.

vspinu commented 3 years ago

I have renamed orverlap->hamming and hamming->bhamming and added tests.

vspinu commented 3 years ago

Because of the different context you will get an output like

✖ |   8 1     | NN descent bhamming
───────────────────────────────────────────────────────────────────────────────────────────
Failure (test_descent_hamming.R:42:3): (code run outside of `test_that()`)
sum(bitdata) not equal to 791.
1/1 mismatches
[1] 790 - 791 == -1

Looks as clear as it can be to me.

For the loops in the other places I can add label argument to the expect_equal so the output will look like

Failure (test_brute_force.R:76:3): (code run outside of `test_that()`)
hamming: not equal to `bit6q_hdsum`.
1/1 mismatches
[1] 1275 - 1986 == -711

Not a big deal anyhow, whatever you choose. Just want to double check.

jlmelville commented 3 years ago

My experience with tests in loops is that when something goes wrong with one of the loop items, you end up having to manually unroll the loop to separate the passing and failing items. So I now prefer to not have a loop at all.

vspinu commented 3 years ago

Ok. I have unrolled all the loops.

jlmelville commented 2 years ago

@vspinu I have integrated the changes from this PR manually. I have also added you as a contributor in the DESCRIPTION. Thanks for the PR, sorry it took so long to get this merged.

vspinu commented 2 years ago

@jlmelville Sorry for not coming back on this one. Vacation came in and after I couldn't find time to revisit it.

Unfortunately we also moved away from the nearest neighbor in our project and I might not have enough drive to pursue other improvements that we have discussed here and there. I will let you know if I do :)