mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

use module mara-storage for file handling #55

Closed ice1e0 closed 2 years ago

ice1e0 commented 3 years ago

This PR adds an abstraction layer to all file and directory handling via a new separate module mara-storage.

The module in version 0.9.0 supports just local file storages, but is build with the idea in mind to support other storages like AWS S3, GCS or hadoop in the future.

It is expected to be merged for version 3.2.0. If not, adjustments are required. Some functions and classes are marked to be deprecated since they are or can be replaced by functions in the mara-storage module. I expect the deprecated functions and classes to be removed in the next major version 4.0.0 to not make breaking changes in 3.x

mara-storage has been tested via several unit tests. I would love to see unit tests accordingly in the mara-pipelines module. I did not do a complete test of all functionalities in the pipeline module. I hope the community can help out with that.

ice1e0 commented 3 years ago

upgrade + rebase of branch

Is there anything blocking from merging this into master? I did not get much feedback. Is someone using this already in production?