jprante / elasticsearch-knapsack

Knapsack plugin is an import/export tool for Elasticsearch
Apache License 2.0
472 stars 77 forks source link

Exports only one search result #67

Open Thilak-T opened 9 years ago

Thilak-T commented 9 years ago

i tried to export with the following POST http://localhost:9200/_export/?path=/tmp/elasticsearch/test.bulk { "query": { "filtered": { "query": { "match": { "environment": "beta" } }, "filter": { "range": { "@timestamp": { "gte": "2015-04-29T00:00:00.000Z", "time_zone": "+8:00" } } } } }, "fields": [ "message" ] }

It always export only the first result. if i remove the search query, it exports all data properly

Thilak-T commented 9 years ago

Does the query in POST export request supports elasticsearch full query DSL?

jprante commented 9 years ago

Check if

curl -XPOST 'http://localhost:9200/_search' -d '
{
    "fields": [
        "message"
    ],
    "query": {
        "filtered": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "gte": "2015-04-29T00:00:00.000Z",
                        "time_zone": "+8:00"
                    }
                }
            },
            "query": {
                "match": {
                    "environment": "beta"
                }
            }
        }
    }
}
'

gives the required total hits.

Thilak-T commented 9 years ago

still the same...has only one hit

jprante commented 9 years ago

If _search gives one hit, Knapsack also can get only one hit.

Thilak-T commented 9 years ago

yes of course..._search gives 1091 hits but knapsack exports only one

mattburns commented 8 years ago

I also get this exact behaviour. Did you solve it in the end?

proof:

curl -XPOST 'localhost:9200/scf/doc/_export?path=/tmp/export.bulk&overwrite=true&pretty=true' -d '{
   "query" : {
       "match" : {
           "SerialNumber" : "480417204"
       }
   },
   "fields" : [ "_id", "url" ]
}'

However the export only has 1 result :

cat /tmp/export.bulk
{"index":{"_index":"scf","_type":"doc","_id":"http://farm6.staticflickr.com/5056/5521728468_7a187ba9bc_o.jpg"}
http://farm6.staticflickr.com/5056/55 21728468_7a187ba9bc_o.jpg

Changing the command to search instead of export proves that the same query should return 212 results (output truncated) :

curl -XPOST 'localhost:9200/scf/doc/_search?path=/tmp/export.bulk&overwrite=true&pretty=true' -d '{
   "query" : {
       "match" : {
           "SerialNumber" : "480417204"
       }
   },
   "fields" : [ "_id", "url" ]
}'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 212,
    "max_score" : 12.048178,
    "hits" : [ {
      "_index" : "scf",
      "_type" : "doc",
      "_id" : "http://mattburns.co.uk/images/0.jpg",
      "_score" : 12.048178,
      "fields" : {
        "url" : [ "http://mattburns.co.uk/images/0.jpg" ]
      }
    }, {
      "_index" : "scf",
      "_type" : "doc",
      "_id" : "http://mattburns.co.uk/smiletapper/resources/unhappy/15.jpg",
      "_score" : 12.048178,
      "fields" : {
        "url" : [ "http://mattburns.co.uk/smiletapper/resources/unhappy/15.jpg" ]
      }
    } 
[...snip...]
mattburns commented 8 years ago

As an update I figured out a solution that works for me... all results are returned if I don't specify the "fields" I want. But I can still achieve the same thing by passing in the _source instead. See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-get.html#get-source-filtering

In short, just change "fields": [ to "_source": [ in your request.