rustyio / super-imap

SuperIMAP - Monitor inboxes for incoming email, at scale.
241 stars 41 forks source link

SuperIMAP - Version 0.1.2

SuperIMAP helps you build email-driven applications. It takes care of connecting to a customer's IMAP inbox, watching for new email, and triggering a webhook to your application when a new email arrives, typically within seconds.

SuperIMAP is built for scale. FiveStreet.com built SuperIMAP as an alternative to Context.io. It contains a subset of Context.io Lite API functionality. As of July 2015, the FiveStreet team runs a SuperIMAP cluster processing ~400k emails per day for thousands of users.

SuperIMAP is written in Ruby and open sourced under the MIT license. Why Ruby?

Deploy

Contents

Screenshot

Screenshot

Installation

SuperIMAP was built to run on Heroku, but can run in any environment that supports Rails.

Installing on Heroku

  1. Provision a new Heroku project.
  2. Add an encrypted database.
  3. Set the SECRET_KEY_BASE and ENCRYPTION_KEY environment variables to something really long and complicated.
  4. Add a Heroku remote endpoint, and push the code.
  5. Ramp up some workers.
  6. Seed the database with heroku run rake db:setup db:seed

Then, log in as the default user: admin@example.com / password.

Remember: Change the username and password immediately!

The production Procfile assumes that you are installing on Heroku. As a result it has multiple definitions for the imap_client process corresponding to different sized Heroku dynos. In order for load balancing to work correctly, you should have all imap_client processes use the same dyno size. Do not mix and match boxes.

Installing Elsewhere

  1. Get the code: git clone https://github.com/rustyio/super-imap.git
  2. Update config/database.yml. (Use config/database.yml.example)
  3. Run bundle to install dependencies.
  4. Set the SECRET_KEY_BASE and ENCRYPTION_KEY environment variables to something really long and complicated.
  5. Seed the database with RAILS_ENV=production rake db:setup db:seed
  6. Start the processes: foreman start -c "web=1, worker=1, imap_client_1x=1

Then, log in as the default user: admin@example.com / password.

Remember: Change the username and password immediately!

The production Procfile assumes that you are installing on Heroku. As a result it has multiple definitions for the imap_client process corresponding to different sized Heroku dynos. In order for load balancing to work correctly, you should ensure that all imap_client processes are the same size and point to the same database.

Usage

Set up a Partner

A single SuperIMAP instance can support multiple applications (and/or multiple environments for a single application.) This is done by creating a new "Partner" for every different application (and/or environment).

  1. Open the SuperIMAP dashboard.
  2. Click on the "Partners" tab. Click "New Partner".
  3. Set webhooks to notify your application of the following events:
    • A new email has arrived.
    • A user has connected their email account.
    • A user has disconnected their email account.

Set up a Partner Connection

  1. Still within the dashboard, click the "Partners" tab.
  2. Click on the "Connections" link next to your Partner.
  3. Click on "New Partner Connection".
  4. Choose an authentication type, and fill out any necessary credentials.

For Gmail, you will need to get OAuth 2.0 Client credentials here from the Developer Console.

Add and Connect a User

  1. Still within the dashboard, click on "Partners".
  2. Click on the "Connections" link next to your partner.
  3. Click on the "Users" link next to your connection.
  4. Click on the "New User" button.
  5. Click the 'Connect' link to connect the user to an IMAP provider.
  6. Send yourself email, and watch the logs!

Add and Connect a User Programatically

The next step is to update your application to handle the process of creating and connecting to SuperIMAP users. Here is some example code:

    require 'rest-client'

    url = "https://my-app.com/api/v1/connections/GMAIL_OAUTH2/users"
    users = RestClient::Resource.new(url, :headers => {
      :'x-api-key'  => "$API_KEY$",
      :content_type => :json,
      :accept       => :json
    })

    # Create the user.
    users.post(:tag => "MY_USER")

    # Get the connect url.
    response = users["MY_USER"].get
    connect_url = JSON.parse(response)['connect_url']

    # Set up the success and failure callbacks.
    callbacks = {
      :success => "http://my-app.com/connect_callback?success=1",
      :failure => "http://my-app.com/connect_callback?failure=1"
    }

    # Redirect the user to the connect url.
    redirect_to connect_url  + "?" + callbacks.to_query

Disconnect a User Programatically

Below is sample code to disconnect a user:

    url = "https://my-host.com/api/v1/connections/GMAIL_OAUTH2/users"
    users = RestClient::Resource.new(url, :headers => {
      :'x-api-key'  => "$API_KEY$",
      :content_type => :json,
      :accept       => :json
    })

    # Later, if you want to disconnect the user.
    response = users["MY_USER"].get
    disconnect_url = JSON.parse(response)['disconnect_url']

    # Set up the success and failure callbacks.
    callbacks = {
      :success => "http://my-app.com/disconnect_callback?success=1"
    }

    # Redirect the user to the disconnect url.
    redirect_to disconnect_url  + "?" + callbacks.to_query

Security

This is a good time to mention security. It is a big responsibility to hold the keys to someone's email. Treat it with the appropriate amount of caution.

If you use this code:

Other security measures within SuperIMAP:

Webhooks

SuperIMAP sends new email events (and other events) to your applications through webhooks:

Webhook Security

All webhooks are signed. You can validate the signature as follows:

    # Parse the incoming JSON body.
    json_params = JSON.parse(request.raw_post)

    # Calculate expected signature.
    digest    = OpenSSL::Digest.new('sha256')
    api_key   = Rails.application.config.super_imap_api_key
    sha1      = json_params['sha1']
    timestamp = json_params['timestamp']
    expected_signature = OpenSSL::HMAC.hexdigest(digest, api_key, "#{timestamp}#{sha1}")

    # Get actual signature.
    actual_signature = json_params['signature']

    # Compare signatures.
    valid = expected_signature == actual_signature

New Mail Webhook

Called when a new mail arrives in a user's inbox.

User Connected Webhook

Called when a user has successfully authenticated with an IMAP provider. Only applies to OAuth connections at the moment.

User Disconnected Webhook

Called when a user has disconnected from an IMAP provider. Only applies to OAuth connections at the moment.

API

All API calls are scoped by partner. To authenticate, send the Partner's API key using a header or a parameter. (A header is preferred because it won't normally appear in HTTP logs.)

    # Access the API curl:
    curl -H "Accept: json" \
         -H "x-api-key:APIKEY" \
         https://my-host.com/api/v1/connections
    # Access the API using the rest-client gem:
    url = "https://my-host.com/api/v1"
    resource = RestClient::Resource.new(url, :headers => {
      :'x-api-key'  => "$API_KEY$",
      :content_type => :json,
      :accept       => :json
    })
    resource['connections'].get

/api/v1/connections

GET

Get a list of connections for the specified partner.

POST

Create a new connection.


/api/v1/connections/:IMAP_PROVIDER_CODE

GET

Get information about a given connection.

PUT

Update settings for a given connection. The required parameters depend on the IMAP provider used.

DELETE

Delete a connection and all underlying user data.


/api/v1/connections/:IMAP_PROVIDER_CODE/users

GET

Get a list of users for the specified IMAP Provider.

POST

Create a new user.


/api/v1/connections/:IMAP_PROVIDER_CODE/users/:TAG

GET

Get information about the given user, including:

PUT

Update a user. The required parameters depend on the IMAP provider used.

DELETE

Archive a user. The user can be restored in the web interface, or by updating the user (ie: a PUT request.

Operations

Process Types

SuperIMAP consists of 3 different processes, all written in Ruby / Rails:

Environment Variables

Scaling

To scale SuperIMAP, you will mainly want to increase the number of IMAP Client processes. The IMAP Client processes automatically publish a heartbeat every 10 seconds. Other instances look for this heartbeat and re-calculate which neighboring processes are alive based on any processes that have published a heartbeat within the last 30 seconds.

The IMAP Client processes re-balance users every 10 seconds. If no new instances have entered or left the pool, then re-balancing will have no effect.

If a new IMAP Client instance is started, then a small number of users will be taken from each running instance and handed to the new instance. If one of the IMAP Client instances is stopped it is removed from the pool, then it's users will be evenly distributed to the remaining instances (assuming they are still below the MAX_USER_THREADS threshold.)

There is no "master" process that decides which IMAP Client process should handle a given user. SuperIMAP uses a Rendezvous Hash to allow IMAP Client instances to agree on how to evenly assign users without any central coordination. The algorithm assumes that all SuperIMAP instances have roughly the same number of resources.

Monitoring

SuperIMAP publishes some useful monitoring information in the logs. This includes:

These metrics are published in a format that can be consumed by the Librato Add-On in Heroku. See https://devcenter.heroku.com/articles/librato#custom-log-based-metrics for more information.

Apart from keeping an eye on these metrics, SuperIMAP should need no other regular metrics.

You may also want to keep an eye out for any failing Delayed Job tasks. You can view these from the Admin site.

Tracer Emails

SuperIMAP has the ability to give you useful monitoring information through "tracer emails". The system will send a specially formatted email to an account, wait for the incoming email, and log the results. The logs can be accessed through the "Tracer Logs" tab.

To enable Tracer Emails, navigate to a user and check the "Enable Tracer" checkbox. It is recommended that you create a few dummy email addresses to use for tracer emails.

By default, a cluster of three tracers are sent every ten minutes from each imap_client instance to a random tracer-enabled user managed by that instance.

Keep in mind that this could generate a lot of email. Three emails every ten minutes works out to ~430 emails per day.

Performance

SuperIMAP's architecture makes judicious use of system resources:

All connections to the IMAP server are managed by separate "user threads", but these threads sit dormant most of the time. When anything interesting happens that requires real work, the operation is queued and handled by a worker in a worker pool. The size of the worker pool is controlled by the NUM_WORKER_THREADS environment variable. Only worker pools threads, and a small number of other system threads, require a database connection.

In terms of tradeoffs, this architecture chooses to slightly degrade an individual user's response time in favor of making sure that the system will not get overloaded when things get rough. When things get busy, the work simply builds up in the queue. The size of the worker queue, and the queue latency, becomes a rough measure of system health.

Typically, a SuperIMAP box is resource-limited by the number of user processes that can be started. SuperIMAP requires 2 user processes for each user's IMAP connection. On Heroku, the number of user processes are limited at 256 for a 1X box, 512 for a 2X box, and 32,767 for a PX box. You can set this at home using ulimit -u. Divide this in half to get the maximum number of users that the SuperIMAP process can manage.

FiveStreet.com uses SuperIMAP to manages thousands of users and process over 1M incoming emails per week (as of January 2015). We currently run this load on a single Heroku PX dyno, with plenty of headroom. Our SuperIMAP instance serving thousands of users requires just 10 database connections, uses about 3GB of RAM, and has a 0.50 load average. The work queue usually sits near 0, with a latency of < 0.5 seconds.

Why Ruby?

At first glance, and from a purely technical point-of-view, Ruby is a poor choice for an application like SuperIMAP. SuperIMAP is highly concurrent, and Ruby is bad at concurrency.

Specifically, the imap_client process spawns what could technically be described as a "boatload" of threads (2 threads per connected user, plus a handful of other threads). Ruby threads are heavyweight, so the interpreter has to burn significant resources just to create and schedule the threads before it can do any real work.

Using Erlang, Go, or Rust (all of which support lightweight threads and actor-style programming) would have made the concurrent bits of SuperIMAP less tricky to write, and would have required fewer computing resources, possibly allowing a single box to handle tens of thousands of active users.

So, why Ruby? A few reasons:

Side Note: This was a deeply considered choice. I (Rusty Klophaus, the author of SuperIMAP) spent about 4 years writing Erlang professionally. It's a fascinating language.

Appendix

Understanding OAuth 2.0

SuperIMAP can authenticate to email providers using OAuth 2.0. OAuth 2.0 can be difficult to understand. Here is how it works, in the context of a user authenticating to Gmail through SuperIMAP:

On your application:

On your SuperIMAP instance:

On Google:

On your SuperIMAP instance:

On your application:

To get one step more complicated, OAuth is secured in a few different ways:

If you want more detail, here's a video tutorial: https://www.youtube.com/watch?v=tFYrq3d54Dc

The OAuth settings are configured through the Google Developer console: https://console.developers.google.com/

Understanding IMAP

Once you are authenticated to an IMAP server, IMAP itself is a fairly straightforward protocol. It consists of simple plain text commands and responses. The commands and responses are tagged, allowing multiple commands to run in parallel.

Below is a sample IMAP session, taken directly from the Internet Message Access Protocol RFC (3501):

    S:   * OK IMAP4rev1 Service Ready
    C:   a001 login mrc secret
    S:   a001 OK LOGIN completed
    C:   a002 select inbox
    S:   * 18 EXISTS
    S:   * FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
    S:   * 2 RECENT
    S:   * OK [UNSEEN 17] Message 17 is the first unseen message
    S:   * OK [UIDVALIDITY 3857529045] UIDs valid
    S:   a002 OK [READ-WRITE] SELECT completed
    C:   a003 fetch 12 full
    S:   * 12 FETCH (FLAGS (\Seen) INTERNALDATE "17-Jul-1996 02:44:25 -0700"
         RFC822.SIZE 4286 ENVELOPE ("Wed, 17 Jul 1996 02:23:25 -0700 (PDT)"
         "IMAP4rev1 WG mtg summary and minutes"
         (("Terry Gray" NIL "gray" "cac.washington.edu"))
         (("Terry Gray" NIL "gray" "cac.washington.edu"))
         (("Terry Gray" NIL "gray" "cac.washington.edu"))
         ((NIL NIL "imap" "cac.washington.edu"))
         ((NIL NIL "minutes" "CNRI.Reston.VA.US")
          ("John Klensin" NIL "KLENSIN" "MIT.EDU")) NIL NIL
          "<B27397-0100000@cac.washington.edu>")
          BODY ("TEXT" "PLAIN" ("CHARSET" "US-ASCII") NIL NIL "7BIT" 3028 92))
    S:    a003 OK FETCH completed
    C:    a004 fetch 12 body[header]
    S:    * 12 FETCH (BODY[HEADER] {342}
    S:    Date: Wed, 17 Jul 1996 02:23:25 -0700 (PDT)
    S:    From: Terry Gray <gray@cac.washington.edu>
    S:    Subject: IMAP4rev1 WG mtg summary and minutes
    S:    To: imap@cac.washington.edu
    S:    cc: minutes@CNRI.Reston.VA.US, John Klensin <KLENSIN@MIT.EDU>
    S:    Message-Id: <B27397-0100000@cac.washington.edu>
    S:    MIME-Version: 1.0
    S:    Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
    S:
    S:    )
    S:    a004 OK FETCH completed
    C:    a005 store 12 +flags \deleted
    S:    * 12 FETCH (FLAGS (\Seen \Deleted))
    S:    a005 OK +FLAGS completed
    C:    a006 logout
    S:    * BYE IMAP4rev1 server terminating connection
    S:    a006 OK LOGOUT completed

SuperIMAP uses the IDLE command to wait for incoming email, defined in the IMAP4 Idle Command RFC (2177). The IMAP client sends an IDLE command to the server and awaits a response. When an IMAP connection is in IDLE mode, no other commands are allowed.

Development Tasks

The information below is mainly intended at developers who want to modify the SuperIMAP codebase.

Running Unit Tests

Run this once:

RAILS_ENV=test rake db:setup db:seed

Then run all tests:

rake test:all

Running Stress Tests

The stress test exercises the multi-threaded aspects of SuperIMAP, as well as the error recovery code. To do this, we point the SuperIMAP IMAP client code against a local IMAP server and generate a bunch of fake emails for many users.

Additionally, the IMAP server generates 'chaotic' events; it will intentionally generate incorrect or gibberish responses. The SuperIMAP IMAP client code is expected to recover gracefully while using a minimal amount of system resources.

Run this once:

RAILS_ENV=stress rake db:setup db:seed

Then run the stress test:

script/stress-test

Future Work

Contributions

To contribute to this project, please fork and file a pull request. Small patches will be accepted more quickly than large patches.

Changes

Version 0.1.2

License

The MIT License (MIT)

Copyright (c) 2015 Rusty Klophaus / FiveStreet.com

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.