singer-io / tap-s3-csv

GNU Affero General Public License v3.0
27 stars 52 forks source link

tap-s3-csv

This is a Singer tap that reads data from files located inside a given S3 bucket and produces JSON-formatted data following the Singer spec.

How to use it

tap-s3-csv works together with any other Singer Target to move data from s3 to any target destination.

Install and Run

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

It's recommended to use a virtualenv:

 python3 -m venv ~/.virtualenvs/tap-s3-csv
 source ~/.virtualenvs/tap-s3-csv/bin/activate
 pip install -U pip setuptools
 pip install -e '.[dev]'

Configuration

Here is an example of basic config, and a bit of a run down on each of the properties:

{
    "start_date": "2017-11-02T00:00:00Z",
    "account_id": "1234567890",
    "role_name": "role_with_bucket_access",
    "bucket": "my-bucket",
    "external_id": "my_optional_secret_external_id",
    "tables": "[{\"search_prefix\":\"exports\",\"search_pattern\":\"my_table\\\\/.*\\\\.csv\",\"table_name\":\"my_table\",\"key_properties\":\"id\",\"date_overrides\":\"created_at\",\"delimiter\":\",\"}]",
    "request_timeout": 300
}

The table field consists of one or more objects, JSON encoded as an array and escaped using backslashes (e.g., \" for " and \\ for \), that describe how to find files and emit records. A more detailed (and unescaped) example below:

[
    {
        "search_prefix": "exports"
        "search_pattern": "my_table\\/.*\\.csv",
        "table_name": "my_table",
        "key_properties": "id",
        "date_overrides": "created_at",
        "delimiter": ","
    },
    ...
]

A sample configuration is available inside config.sample.json


Copyright © 2018 Stitch