sirensolutions / siren-join

[This is the old, single node version for Elasticsearch 2.x, see the latest "Siren Federate" plugin for distributed Elasticsearch 5.x and 6.x capabilities]
http://siren.io
GNU Affero General Public License v3.0
183 stars 60 forks source link

support joining on a metafield #63

Open paulwilton opened 8 years ago

paulwilton commented 8 years ago

Hi Is it possible to join some field from the inner index to the "_id" field (elastic identifier) of the docs in the outer index? rather than a field from the _source object eg:

{
  "bool" : {
    "filter" : {
      "filterjoin" : {
        "_id" : {
          "indices" : ["my-index"],
          "path" : "pathToKeyId",
          "query" : {
            "bool" : {
              "filter" : [
                {"term" : {"someField" : "someValue"}},
              ]
            }
          }
        }
      }
    }
  }
}
scampi commented 8 years ago

It should be possible to join on the _id field. The only drawback is that you cannot use doc_values with it, although it is something that elasticsearch is working on to allow. Therefore, you might need more memory to perform the join.

Please reply if it actually doesn't work! Thanks.

scampi commented 8 years ago

@paulwilton I tried on my end and it doesn't seem possible. We'll work on supporting this.

paulwilton commented 8 years ago

Hi Stéphane No problem, I checked also, and have now worked around it, by surfacing a key as a document property in the target index. I had assumed (possibly incorrectly) that it would be more performant using the "_id" property in the outer index, as the elastic index would have an efficient mechanism for lookup on this.

thanks, Paul

rendel commented 8 years ago

@paulwilton after some investigation, the _id field in elasticsearch is not indexed [1] and it is derived from the _uid field. There is a discussion in elasticsearch to support doc values for the _id field [2]. If this issue is resolved by elasticsearch, then it will be easier for us to support that. Right now, to support joining on a _id field, we would have to add an internal mapping to map _id field to _uid field. An easier fallback solution is, as you suggested, to add explicitly the id as a document field (not indexed but with doc values activated). This will come at the cost of an increase in index size, but the positive side is that by using doc values it will be more heap friendly (the doc values will be cached off heap).

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html [2] https://github.com/elastic/elasticsearch/issues/11887