liberation / django-elasticsearch

Simple wrapper around elasticsearch-py to index/search a django Model.
MIT License
212 stars 73 forks source link

How to make .filter(field__in=[value1,value2] #50

Open onegreyonewhite opened 8 years ago

onegreyonewhite commented 8 years ago

It`s look like possible: Finding Multiple Exact Values

lauxley commented 8 years ago

It works: https://github.com/liberation/django-elasticsearch/blob/master/django_elasticsearch/tests/test_qs.py#L172

onegreyonewhite commented 8 years ago

it is quite different. I do not want to use the "in" operator. I want to get satisfying results. Your example only checks the condition of the occurrence of the results. I asked about construction like sql "field in [...]".

lauxley commented 8 years ago

Ok i might have closed this issue a bit hastily, did you try with the __contains lookup ? If it doesn't work I will need a complete use case to make sure it's not feasible before adding a new functionality.

onegreyonewhite commented 8 years ago

For example, I have Params model. If I call _Params.es.filter(objcontains=["1","2"])_, everything looks like normal. But when I call _Params.es.filter(objcontains=["1","2"], time_gte="now-1m")

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/django_elasticsearch/query.py", line 76, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/usr/local/lib/python2.7/dist-packages/django_elasticsearch/query.py", line 97, in __getitem__
    self.do_search()
  File "/usr/local/lib/python2.7/dist-packages/django_elasticsearch/query.py", line 270, in do_search
    r = es_client.search(**search_params)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 548, in search
    doc_type, '_search'), params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 108, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'search_phase_execution_exception', u'No query registered for [must]')
onegreyonewhite commented 8 years ago

My models classes:

class MainEsIndexable(EsIndexable):
    id = models.CharField(max_length=25, primary_key=True, default=random_string)
    time = models.DateTimeField(default=timezone.now)
    server = models.CharField(max_length=255)
    param = models.CharField(max_length=255)
    obj = models.CharField(max_length=255, default="None")

    class Meta:
        abstract = True

    def __init(self, *args, **kwargs):
        super(MainEsIndexable,self).__init__(*args, **kwargs)
        self._synced = False

    def save(self, *args, **kwargs):
        self.es.do_index()

    def delete(self, *args, **kwargs):
        self.es.delete()

class Params(MainEsIndexable):
    value = models.CharField(max_length=255, default="0")

    class Elasticsearch(EsIndexable.Elasticsearch):
        serializer_class = ParamEsSerializer
        fields = ['time','server','param','obj','value', 'id']
        index = 'monitd-params'
        doc_type = 'param'
        mappings = {
            "_id" : { "path" : "id" },
            "server": {"index": "not_analyzed"},
            "param": {"index": "not_analyzed"},
            "obj": {"index": "not_analyzed"},
            "time": { "type": "date" }
        }

    class Meta:
        managed = False

For query Params.es.filter(obj__contains=[str(j) for j in range(7)], time__gte="now-1m", server="192.168.0.251", param="lmTempSensorsValue") results returned only with last (6) obj. Created query:

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "time": {
                  "gte": "now-1m"
                }
              }
            },
            {
              "term": {
                "param": "lmTempSensorsValue"
              }
            },
            {
              "term": {
                "server": "192.168.0.251"
              }
            }
          ]
        }
      },
      "query": {
        "match": {
          "obj": {
            "query": [
              "0",
              "1",
              "2",
              "3",
              "4",
              "5",
              "6"
            ]
          }
        }
      }
    }
  }
}

But results must be with obj 1-7 values.

lauxley commented 8 years ago

range(7) returns a 0 indexed list so it looks normal to me ! But does the PR#51 fix the query sql syntax error ?

onegreyonewhite commented 8 years ago

It is normal. But results are not normal. I have data with obj values in range 1-7, but result is only with "6" obj value.

51 request solve all last problems, but query results is not full.

lauxley commented 8 years ago

I'm not sure I understand what you mean, if you want documents with "obj" being equal to "7", you can use range(1, 7) ? But i doubt it is really the problem here, and i'm not even sure it works with a match query.

onegreyonewhite commented 8 years ago

@lauxley this behavior is very strange. When I do a query from Django, I'm getting results with the obj parameter just equal to 6. For the range (1,7) results from obj = 7. But if I make a request directly to the ElasticSearch, the results come very strange, ignore blocks "term".

Could we using finding_multiple_exact_values for this case?

lauxley commented 8 years ago

As stated in https://github.com/liberation/django-elasticsearch/pull/53 we may need another lookup for this or at the very least to investigate a little bit more.

onegreyonewhite commented 8 years ago

I added some fixes to PR #53. Now may use MyModel.es.filter(field__in=['one', 'two']).