industrydive / fileflow

Airflow plugin to transfer arbitrary files between operators
http://fileflow.readthedocs.io/en/latest/
Apache License 2.0
78 stars 21 forks source link

fileflow

Documentation Status

Fileflow is a collection of modules that support data transfer between Airflow tasks via file targets and dependencies with either a local file system or S3 backed storage mechanism. The concept is inherited from other pipelining systems such as Make, Drake, Pydoit, and Luigi that organize pipeline dependencies with file targets. In some ways this is an alternative to Airflow's XCOM system, but supports arbitrarily large and arbitrarily formatted data for transfer whereas XCOM can only support a pickle of the size the backend database's BLOB or BINARY LARGE OBJECT implementation can allow.

Installation

pip install from git: pip install git+git://github.com/industrydive/fileflow.git#egg=fileflow

Resources

Contributors