Using pipelines #226

Closed Jensxy closed 5 years ago

Jensxy commented 5 years ago

I have created a pipeline within R. Now I want to use the pipeline to index documents with additional fields. I pipeline looks like this

body <- '{
 "description" : "Extract attachment information encoded in Base64 with UTF-8 charset",
"processors" : [
  "attachment" : {
  "field" : "data"
pipeline_create(id = "attachment", body = body)

My problem is that I want to index documents with attachments (emails).

So my 2 questions are.

  1. How do I use the pipeline to index documents with additional fields like sender, receiver etc.?
  2. How do I use the pipeline within an array of attachments when I have created a pipeline for an array of attachments?

Version: elastic: elastic_0.8.4.9410

{ "name" : "74Fu38x", "cluster_name" : "elasticsearch", "cluster_uuid" : "lKC9cNz8TEqDGMWUUVzweA", "version" : { "number" : "6.2.2", "build_hash" : "10b1edd", "build_date" : "2018-02-16T19:01:30.685723Z", "build_snapshot" : false, "lucene_version" : "7.2.1", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

sckott commented 5 years ago

thanks for the issue @Jensxy

will have a look and get back to you soon.

sckott commented 5 years ago
  1. have you seen these docs seems like you need the properties field to define further fields to index
  2. have you seen these docs seems like you need to use the foreach directive?
Jensxy commented 5 years ago

Yes, I have seen these docs, but how do I apply these things within R using the elastic package? That is my problem.

sckott commented 5 years ago

I think you have to define those things in the body of the request, passed to the body parameter.


  "description": "do a thing",
  "processors": [
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "target_field": "_ingest._value.attachment",
            "field": "",
            "properties": ["title", "name", "author"]

that's not tested, just geussing at what you'nd need. AFAICT I don't think there's any changes needed in this package, but rather you can do through your requst body

Jensxy commented 5 years ago

Okay, I will try it. Thank you very much.

sckott commented 5 years ago

@Jensxy let me know if you get it to work

Jensxy commented 5 years ago

I've just created a foreach pipeline and then I used

es_PUT(file.path(url = make_url(es_get_auth()),
                     index, "doc_1?pipeline=attachment"),
           body = body, config = es_cfg)

And my body looks like this

{"name": "test_name",
 "place": "test_place",
 "attachments" : [
    {"filename": "test_filename1",
     "data" : "test_data1"},
    {"filename": "test_filename2",
     "data" : "test_data2"}]}

Then everything works fine :)

sckott commented 5 years ago

great! i'll see if I should add anything to docs to show an example

sckott commented 5 years ago

@Jensxy see new function pipeline_attachment() and egs