magwyz / pastec

Image recognition open source index and search engine
http://pastec.io
GNU Lesser General Public License v3.0
620 stars 175 forks source link

Can't install/setup #12

Closed jerryharrison closed 9 years ago

jerryharrison commented 9 years ago

I've tried setting up and installing Pastec numerous times with no success. I've tried on my Mac running 10.10.2 AND an EC2 instance running Ubuntu 14.

I keep getting the same error that the following can not be found:

LIBJSONCPP LIBRARY (ADVANCED)

I've tried installing jsoncpp with brew, cmake, cmake-gui, and make. I'm not sure where the issue lies but I can't get anywhere and am getting burnt trying to figure it out.

Any help/advice on getting pastec up and running would be greatly appreciated.

magwyz commented 9 years ago

Please try with Ubuntu 14.10. It should be straightforward on this version.

jerryharrison commented 9 years ago

Perfect. That worked. I've been playing around with it last night and this morning and I do have some thoughts/concerns.

Speed

Is there a preferred setup for the server in terms of CPU and RAM? I've noticed that submitting an image, analyzing an image and returning the result takes around 7 seconds on average. Which is too slow for a production app.

I'm currently only testing a small subset of 40,000 images, currently only 241 images stored in the index. The images used in the search are relatively small (equal size to the ones that generated the index). Adding the images to the index is blazing fast and doesn't take up too much time at all.

Sending Datapoints vs Image Data

Also do you have plans/thoughts about just sending over a hash or the data points to the server instead of the image, I'm thinking that the payload of the processing image data points would be smaller than sending the entire image. OpenCV could process the image "locally" then hit the remote Pastec Server.

Saving/Importing Data set/index

I've tried numerous times to save the current index and reload it when restarting Pastec but it never works. It only works if I build start Pastec, manually rebuild the index, save, clear, then reload it.

Processes I've tried:

  1. Start Pastec with savedindex.dat -> ./pastec savedindex.dat

or

  1. Start Pastec with visualWordsORB.dat -> ./pastec visualWordsORB.dat
  2. After visualWordsORB.dat is loaded send CURL call to load savedindex.dat -> curl -X POST -d '{"type":"LOAD", "index_path":"savedindex.dat"}' http://localhost:4212/index/io

So far I think Pastec is pretty darn good, but I'd to hear your thoughts on the above notes.

jerryharrison commented 9 years ago

So I've bumped up the CPU on the server to 2CPUs and it's sped up the processing from 7s to 4s, but still takes a while.

magwyz commented 9 years ago

Hello Jerry,

Speed

The speed performance you are reporting is indeed very low. With a small index like that, it should be a lot quicker. I guess from what you are saying that you are using a kind of VM / cloud infrastructure. I strongly encourage you to run Pastec on a dedicated physical server. Pastec requires a good physical CPU to run fast (for example a Core 2 duo 3 GHz). It accesses also a lot the RAM (4 GB of RAM should be sufficient for a 40000 image index). I am not sure you can get the same CPU and RAM speed performance with a virtual infrastructure. Personally, I only host pastec instances on physical servers.

Sending Datapoints vs Image Data

Computing the image features on the mobile looks like a good idea at the first sight. There are however several problems to solve. First, it would be slow to extract the features on the mobile if it is not optimized with some NEON code (several seconds). Besides, depending how it is done, it may also require to load the 30 MB visualWordsORB.dat file on the mobile device. Finally, if not compressed, the image feature data may also be bigger than a small 30 KB image that is sufficient to be recognized (see the documentation of the "Search request" API call).

Saving/Importing Data set/index

I think you are mixing up two different types of .dat files. This is misleading because they have the same extension.

I agree all this needs to be improved.