salyh / elasticsearch-imap

IMAP and POP3 email importer for Elasticsearch (no river anymore)
Apache License 2.0
100 stars 25 forks source link

elasticsearch-importer-imap Elasticsearch 2.x

Support Elasticsearch 5.0 readyness and keep elasticsearch imap importer free. Currently IMAP importer is only working with Elasticsearch 2 and it costs a lot of time and effort to update and maintain it for Elasticsearch 5. Donations welcome!

Pledgie:
Click here to lend your support to: Elasticsearch IMAP Importer and make a donation at pledgie.com !

Paypal:
Donate

Patreon:
https://patreon.com/salyh


Import e-mails from IMAP (and POP3) into Elasticsearch 2.x

E-Mail hendrikdev22@gmail.com

Twitter @hendrikdev22

This importer connects to IMAP4 or POP3 servers, poll your mail and index it. The emails on the server will be never modified or removed from the server. The importer tracks (after the first initial full load) which mails are new or deleted and then only update the index for this mails.

Features:

The importer acts as a disconnected client. This means that the importer is polling and for every indexing run a new server connection is opened and, after work is done, closed.

Installation

Prerequisites:

Download .zip or .tar.gz from https://github.com/salyh/elasticsearch-river-imap/releases/latest (only Version 0.8.6 or higher) and unpack them somewhere.

Then run

Configuration

Put the following configuration in a file and store them somewhere with a extension of .json

{
   "mail.store.protocol":"imap",
   "mail.imap.host":"imap.server.com",
   "mail.imap.port":993,
   "mail.imap.ssl.enable":true,
   "mail.imap.connectionpoolsize":"3",
   "mail.debug":"false",
   "mail.imap.timeout":10000,
   "users":["user@domain.com"],
   "passwords":["secret"],
   "schedule":null,
   "interval":"60s",
   "threads":5,
   "folderpattern":null,
   "bulk_size":100,
   "max_bulk_requests":"30",
   "bulk_flush_interval":"5s",
   "mail_index_name":"imapriverdata",
   "mail_type_name":"mail",
   "with_striptags_from_textcontent":true,
   "with_attachments":false,
   "with_text_content":true,
   "with_flag_sync":true,
   "keep_expunged_messages":false,
   "index_settings" : null,
   "type_mapping" : null,
   "user_source" : null,
   "ldap_url" : null,
   "ldap_user" : null,
   "ldap_password" : null,
   "ldap_base" : null,
   "ldap_name_field" : "uid",
   "ldap_password_field" : null,
   "ldap_refresh_interval" : null,
   "master_user" : null,
   "master_password" : null,

   "client.transport.ignore_cluster_name": false,
   "client.transport.ping_timeout": "5s",
   "client.transport.nodes_sampler_interval": "5s",
   "client.transport.sniff": true,
   "cluster.name": "elasticsearch",
   "elasticsearch.hosts": "localhost:9300,127.0.0.1:9300"

}

Note: For POP3 only the "INBOX" folder is supported. This is a limitation of the POP3 protocol.

Default Mapping Example

"mail" : {
        "properties" : {
          "attachmentCount" : {
            "type" : "long"
          },
          "bcc" : {
            "properties" : {
              "email" : {
                "type" : "string"
              },
              "personal" : {
                "type" : "string"
              }
            }
          },
          "cc" : {
            "properties" : {
              "email" : {
                "type" : "string"
              },
              "personal" : {
                "type" : "string"
              }
            }
          },
          "contentType" : {
            "type" : "string"
          },
          "flaghashcode" : {
            "type" : "integer"
          },
          "flags" : {
            "type" : "string"
          },
          "folderFullName" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "folderUri" : {
            "type" : "string"
          },
          "from" : {
            "properties" : {
              "email" : {
                "type" : "string"
              },
              "personal" : {
                "type" : "string"
              }
            }
          },
          "headers" : {
            "properties" : {
              "name" : {
                "type" : "string"
              },
              "value" : {
                "type" : "string"
              }
            }
          },
          "mailboxType" : {
            "type" : "string"
          },
          "receivedDate" : {
            "type" : "date",
            "format" : "basic_date_time"
          },
          "sentDate" : {
            "type" : "date",
            "format" : "basic_date_time"
          },
          "size" : {
            "type" : "long"
          },
          "subject" : {
            "type" : "string"
          },
          "textContent" : {
            "type" : "string"
          },
          "to" : {
            "properties" : {
              "email" : {
                "type" : "string"
              },
              "personal" : {
                "type" : "string"
              }
            }
          },
          "uid" : {
            "type" : "long"
          }
        }
      }
    }

For advanced mapping ideas look here:

Advanced Mapping Example (to be set manually using "type_mapping")

{
   "mail":{
      "properties":{
         "textContent":{
            "type":"langdetect"
         },
         "email":{
            "type":"string",
            "index":"not_analyzed"
         },
         "subject":{
            "type":"multi_field",
            "fields":{
               "text":{
                  "type":"string"
               },
               "raw":{
                  "type":"string",
                  "index":"not_analyzed"
               }
            }
         },
         "personal":{
            "type":"multi_field",
            "fields":{
               "title":{
                  "type":"string"
               },
               "raw":{
                  "type":"string",
                  "index":"not_analyzed"
               }
            }
         }
      }
   }
} 

Content Example

{
      "_index" : "imapriverdata",
      "_type" : "mail",
      "_id" : "50220::imap://test%40xxx.com@imap.strato.de/import",
      "_score" : 1.0, "_source" : {
  "attachmentCount" : 0,
  "attachments" : null,
  "bcc" : null,
  "cc" : null,
  "contentType" : "text/plain; charset=ISO-8859-15",
  "flaghashcode" : 16,
  "flags" : [ "Recent" ],
  "folderFullName" : "test",
  "folderUri" : "imap://test%40xxx.com@imap.strato.de/import",
  "from" : {
    "email" : "suchagent@isrch.de",
    "personal" : null
  },
  "headers" : [ {
    "name" : "Subject",
    "value" : "Suchagent Wohnung mieten in Berlin -  1 neues Objekt gefunden!"
  }, {
    "name" : "Return-Path",
    "value" : "<suchagent@isrch.de>"
  }, {
    "name" : "Content-Transfer-Encoding",
    "value" : "quoted-printable"
  }, {
    "name" : "To",
    "value" : "sss@ddd.org"
  }, {
    "name" : "X-OfflineIMAP-1722382714-52656d6f7465-6165727a7465",
    "value" : "1248516496-0146849121575-v5.99.4"
  }, {
    "name" : "Message-ID",
    "value" : "<8277550.1132283844462.JavaMail.noreply@isrch.de>"
  }, {
    "name" : "Mime-Version",
    "value" : "1.0"
  }, {
    "name" : "X-Gmail-Labels",
    "value" : "ablage,hendrik.yyy@gmx.de"
  }, {
    "name" : "X-GM-THRID",
    "value" : "1309162987234255956"
  }, {
    "name" : "Delivered-To",
    "value" : "GMX delivery to sss@ddd.org"
  }, {
    "name" : "Reply-To",
    "value" : "suchagent@isrch.de"
  }, {
    "name" : "Date",
    "value" : "Fri, 18 Nov 2005 04:17:24 +0100 (MET)"
  }, {
    "name" : "Auto-Submitted",
    "value" : "auto-generated"
  }, {
    "name" : "Received",
    "value" : "(qmail invoked by alias); 18 Nov 2005 03:17:25 -0000"
  }, {
    "name" : "Content-Type",
    "value" : "text/plain; charset=\"ISO-8859-15\""
  }, {
    "name" : "From",
    "value" : "suchagent@isrch.de"
  } ],
  "mailboxType" : "IMAP",
  "popId" : null,
  "receivedDate" : 1132283845000,
  "sentDate" : 1132283844000,
  "size" : 3645,
  "subject" : "Suchagent Wohnung mieten in Berlin -  1 neues Objekt gefunden!",
  "textContent" : "Sehr geehrter Nutzer, ... JETZT AUCH IM FERNSEHEN: IMMOBILIENANGEBOTE FÜR HAMBURG UND UMGEBUNG!\r\n\tFinden Sie Ihre Wunschwohnung oder  ..."
  "to" : [ {
    "email" : "sss@ddd.org",
    "personal" : null
  } ],
  "uid" : 50220
}
    } 

Indexing attachments

If you want also indexing your mail attachments look here:

Contributers/Credits

License

Copyright (C) 2014-2015 by Hendrik Saly (http://saly.de) and others.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.