pixelogik / NearPy

Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes.
MIT License
759 stars 152 forks source link

Add permutation module #20

Closed shixing closed 9 years ago

shixing commented 9 years ago

Hi, I've add a new module permutation, which aims to find the neighbour bucket keys. I've add two new methods in engine.py: 1) build_permuted_index(), which build a permutedIndex for each binary hashes. 2) neighbours_p(), which is similar to neighbours, except it find the neighbour keys of the matching bucket keys. I've write a example in nearpy.tests.permutation_tests. It will compare the neighbours returned by neighbours(), neighbours_p() and the real nearest neighbours.

If you have any questions or coding format issues, feel free to contact me at github or xingshi@usc.edu

pixelogik commented 9 years ago

Cool thing ! Give me some days to find time to check the new stuff!

pixelogik commented 9 years ago

Sorry for delay, I will do the check / pull today in the evening.

pixelogik commented 9 years ago

Hi,

first of all, thanks for the contribution! The idea is very cool.

I refactored the code a bit to keep the Engine minimal. Your new permutation component is now a meta-hash called HashPermutations, than can be used with the Engine like all other hashes are used.

The only difference is that this meta-hash uses child hashes for actual hashing, done with your permutation component.

How this is done can be seen in the changed test of yours:

    # Create permutations meta-hash
    self.permutations = HashPermutations('permut')

    # Create binary hash as child hash
    rbp = RandomBinaryProjections('rbp1', 4)
    rbp_conf = {'num_permutation':50,'beam_size':10,'num_neighbour':100}

    # Add rbp as child hash of permutations hash
    self.permutations.add_child_hash(rbp, rbp_conf)

    # Create engine with meta hash and cosine distance
    self.engine_perm = Engine(200, lshashes=[self.permutations], distance=CosineDistance())

Before merging this branch into the master it would be cool if you could check it out. I would then merge into master, add some more documentation to it and also support for save/load of hash configs to storages.

Thanks again, hope my changes are fine for you.

pixelogik commented 9 years ago

I forgot to mention: The changed code is in the shixing-feature-permutation branch of my NearPy

shixing commented 9 years ago

That's a great refactoring. You did better than me. I really spend some time trying to incorporate the code into the framework, but yours is just better and clean.

pixelogik commented 9 years ago

Thanks. I think this is natural because I designed the library so it is easier for me to fit new stuff into the existing architecture.

Thanks again for your great contribution! I will soon merge it into the master and release a new version of NearPy. After that I will make sure that managing the development is easier by using tags for release versions and using a develop branch.

If you have any more feedback or ideas on the library while you work with it please feel free to talk about it. The more people contribute, the better it gets :)