rhsimplex / image-match

🎇 Quickly search over billions of images
2.94k stars 405 forks source link

Elasticsearch 5 compatibility ? #28

Closed rezaxdi closed 7 years ago

rezaxdi commented 8 years ago

Hi,

I tried to test image-match but I'm getting this error while trying to search inside db for an image : ` elasticsearch.exceptions.RequestError: TransportError(400, u'parsing_exception', u'no [query] registered for [filtered]')``

I'm using elasticsearch 5, is it supported ?

rhsimplex commented 8 years ago

Hi @rezaxdi,

Thanks for the heads up. We don't support 5.0 yet, but I'll try to get ahead of it. The issue is in elasticsearch_driver.py where we construct a query like:

res = self.es.search(...
                              body={'query':
                                      {
                                          'filtered': {
                                              'query': {
                                                    'bool': {'should': should}
                                              }
                                          }
                                      }},
                              ...)

According to the breaking changes, the filtered query has been deprecated, so this will need to be rewritten.

rezaAdie commented 8 years ago

is there no new image-search version for elastic 5???

rhsimplex commented 8 years ago

No I haven't had time to work on this. I can help you if you'd like to work on it, but I can't say when I'll have time to work on this myself, sorry.

rezaAdie commented 8 years ago

do you have another recommendation for another that have similar function but support with elastic 5?? and if i want to edit the image-search, what i need to change???

rezaAdie commented 8 years ago

hi, @rhsimplex i'm already change the query format and it can already run. But when i run use "ses.search_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')" it can show the result like in example. But when use "ses.search_image('http://192.168.20.35:8090/Ct5S39KUIAApAK4.jpg')" it show it self, not other image that similar "[{'path': u'http://192.168.20.35:8090/Ct5S39KUIAApAK4.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVfSovgyWtLzjc9Z-DcQ', 'metadata': None}]"

rhsimplex commented 8 years ago

Can you show me the query?

rezaAdie commented 8 years ago

it's like this, {'query': { 'bool': { 'must': { 'bool': {'should': should} } } }, '_source': {'excludes': ['simple_word_*']} } and i'm fix it using elasticsearch 5 documentation.

rhsimplex commented 8 years ago

Sorry for the delay, I'll try to look into it soon.

nofxx commented 7 years ago

Trying @rezaAdie code works here, but did some very naive tests...

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

                              body={'query':
                                      {
                                          'bool': {
                                              'must': {
                                                    'bool': {'should': should}
                                              }
                                          },
                                      },
                                    '_source': {'excludes': ['simple_word_*']}
                              },

Gonna try MongoDB too and compare results. Out of curiosity why ES? Too much a diff? Performace? Fuzziness? Sorry, didn't fully understood this algo yet...

rhsimplex commented 7 years ago

@nofxx @rezaAdie thanks for checking, would you be willing to make a PR with the change? I haven't had much time to work on this lately, so that would be a huge help. I can add tests if necessary.

We used ES because it can support a much faster insertion rate. We were initially using image-match in conjunction with a high-volume web crawler.

I haven't maintained the MongoDB wrapper, let me know if it even still works.

char101 commented 7 years ago

{'query': {'bool': {'should': should}}} should be enough, why nest it indiside a must query.

The original filtered query is unnecessary, since there is no filter in it.

wac81 commented 7 years ago

@char101 it's not work for web url

wac81 commented 7 years ago

if i use local URL ,get result but all score is 63 [{'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhx1DpmEaCo1yV3vBzc', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhxxN1rEaCo1yV3vBy5', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhx09vwEaCo1yV3vBzW', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhx13ZSEaCo1yV3vBzo', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhxxYgVEaCo1yV3vBzF', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhx2s6OEaCo1yV3vBzw', 'metadata': None}, {'path': u'./imagedatabase/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhxxTGVEaCo1yV3vBy', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhx06ccEaCo1yV3vBzQ', 'metadata': None}, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhx1LkoEaCo1yV3vBzi', 'metadata': None}]

rhsimplex commented 7 years ago

Ok, I made a PR #49 that uses @nofxx's version of the query, inspired by: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

If I can get a couple of you to :+1: the PR, I'll merge it.

char101 commented 7 years ago

@wac81 How does the must query affects remote or local source?

A boolean query is a compund query, thus it requires at least two subqueries to matter.

char101 commented 7 years ago

This URL https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html is irrelevant because like I explained before, a filtered query requires a filter. There is no filter is the original query, thus it is only a normal bool query.

char101 commented 7 years ago

@wac81 The score value does not matter (it is tf*idf not the image similarity distance), what matter is the dist value which is 0 = exact match. From what see it works with your local images since all the dist values are 0.

wac81 commented 7 years ago

but i add 6 images , but just get same to 3.jpg image, {'path': u'./image_database/3.jpg', 'score': 63.0, 'dist': 0.0, 'id': u'AVhxxN1rEaCo1yV3vBy5', 'metadata': None}

how to adjust get dist with the other images

wac81 commented 7 years ago

@char101

rhsimplex commented 7 years ago

I'll go ahead and merge with @char101 suggested changes, since it passes all the tests.

@wac81 I can't understand what the issue is exactly. If you're able to reproduce it with the latest master branch, please open a new issue with the exact input and output and we'll take it from there.

Thanks everyone!

char101 commented 7 years ago

@rhsimplex Thanks but I think you still need to change exclude to excludes to be compatible with elasticsearch 5.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html