magwyz / pastec

Image recognition open source index and search engine
http://pastec.io
GNU Lesser General Public License v3.0
620 stars 175 forks source link

Looking for my help. What exactly is the index and how is it used #10

Closed showcasefloyd closed 9 years ago

showcasefloyd commented 9 years ago

I'm working on a project for personal use and education and I've been able to compile and run Pastec with no issues. Adding and searching images works great, but I'm not sure what the purpose of the "Save an index", "load an index" or "clear the index" in the API is for. Why is this needed? Any information (or even links to documentation that I can read) would be a great help. Just trying to learn.

Thanks so much in advance,

Floyd

chrishein commented 9 years ago

@showcasefloyd whenever you submit an image to be added to the index by Pastec, the file is processed to extract features (signature) that allow searching for it later. This index of signatures is stored in memory by the server while it is running. If you shut down the server, the index is lost, and when you start it again you must submit all images so that the index is built again. Saving the index dumps all the information contained in it to a file on disk. This allows you to start the server again and load that index from a file, without needing to submit each image again. Clearing the index removes all currently indexed images from Pastec, that is, it clears the in memory index.

showcasefloyd commented 9 years ago

@chrishein First thanks so much for your answer. This helps me a ton. So it sounds like this is Pastec's way of storing it's data then. So as a rule do you save the index ever time a new image is submitted to it. Also are there best practices for how to use the index and if possible can you give me an scenario of why I would ever need to clear it?

Floyd

chrishein commented 9 years ago

@showcasefloyd Regarding rules about saving the index, it all dependes on the usage scenario. Saving the index to disc, especially if it is a large one, will take time and resources, and it will probably block the access to the service. So doing it very often can be a problem.

Clearing the index can be useful when needing to start from scratch, during development for testing, for loading a different index when the usage scenario changes. Again, it all depends on your specific use case.

showcasefloyd commented 9 years ago

@chrishein Okay, thanks again. I think I understand now. By the way are there any limitations and performance issues I need to be thinking about in terms of the index? Meaning if I have a million images indexed will there be a huge performance hit when I try to do a search against it? If so are there techniques to think about when designing my app.

showcasefloyd commented 9 years ago

I have another noob question. Is there a way to actually list / query what's in the index? Basically, if we have to know if an image has already been added to the index, is there a way to find it?

Floyd

magwyz commented 9 years ago

Le 31/03/2015 16:11, showcasefloyd a écrit :

I have another noob question. Is there a way to actually list / query what's in the index? Basically, if we have to know if an image has already been added to the index, is there a way to find it?

There is currently no API call that allows to list the images in the index. However, that would be easy to implement.

Adrien Maglo, Ph.D. Pastec developer, Visualink Founder http://www.pastec.io +33 6 27 94 34 41

showcasefloyd commented 9 years ago

@magwyz - Yes this would be very useful! Thanks so much for all your hard work. Pastec is amazing.

magwyz commented 9 years ago

A new API call that lists the image ids in the index has been added by the commit 748b979991f98cea97973da31be1083a5c1c1eb2. example: ~$ curl -X GET http://127.0.0.1:4212/index/imageIds