magwyz / pastec

Image recognition open source index and search engine
http://pastec.io
GNU Lesser General Public License v3.0
620 stars 175 forks source link

Add In-Index Similarity Search #23

Closed jeresig closed 8 years ago

jeresig commented 8 years ago

Thank you so much for creating the amazing Pastec project, @magwyz! This pull request adds a new API endpoint:

$ curl http://127.0.0.1:4212/index/images/1221010341
{"bounding_rects":[{"height":356,"width":291,"x":34,"y":31},{"height":315,"width":290,"x":34,"y":71}],"image_ids":[1221010341,2694417911],"scores":[577,78],"type":"SEARCH_RESULTS"}

When you access it, providing an ID of an image that's already in the index, it will return a set of similar images that are also in the index. By default you will no longer have to re-upload an image to see what images are similar to it. Depending upon network latency, and the size of the image, this may have some performance improvements.

Additionally an optional image -> word cache is added (which can be enabled via a command-line option --cache-words) to dramatically improve performance, at the expense of memory usage.

Type Time to Respond Memory Usage
Cached In-Index Search 1.02-1.14s 877MB
Un-cached In-Index Search 2.40-2.71s 625MB
Image Upload Search 3.43-3.55s n/a

This is with an index of 59,041 images at 419MB.

I'm sure many improvements can be made to this code, this is my first time writing C++ in many years so feedback is most appreciated! I'm planning on contributing a number of other pull requests as well. Namely being able to: configure the maximum number occurrences for a word, set a string name for an image instead of a number, and being able to set a default index location.

(This branch unfortunately includes @ryanfb's Mac-platform pull request #21, as I needed it to get it to build on my copy of OSX.)

magwyz commented 8 years ago

Hello John,

Thank you very much for your interesting contribution. Being a daily user of JQuery, I am honored to receive patches from you! I will look carefully at to your patches and come back with comments. For your information, I am myself working on a tag system that allows to associate in Pastec a string to each image.

magwyz commented 8 years ago

Hello John,

I just have general comments.

Thanks!

jeresig commented 8 years ago

@magwyz Great call about the name and documentation. I've re-named the option to be --forward-index and have re-named a number of the variables and method arguments, as well. I've also merged with master to make sure I'm current.

Let me know if I can help with the documentation at all. Maybe moving a Markdown copy to Github might be useful?

magwyz commented 8 years ago

Thank you John for your contribution! Moving a Markdown copy of the documentation to Github would definitively make sense. It is on my TODO list.

jeresig commented 8 years ago

@magwyz Thank you so much for merging this -- I'm very happy to contribute! I will be sending some more pull requests your way quite soon.