utdemir / distributed-dataset

A distributed data processing framework in Haskell.
BSD 3-Clause "New" or "Revised" License
114 stars 5 forks source link

Utility functions to read different file formats #19

Open utdemir opened 5 years ago

utdemir commented 5 years ago

Currently, we expect users to write a Conduit to read data from external sources. This is quite easy, however it would be even better to provide some combinators to use common formats and storage systems; eg. JSON, CSV, gzip, parquet and HDFS, S3, HTTP.

Almost all of them already have libraries on Hackage providing Conduit's we can directly use, however it is not desirable increase our dependency footprint a lot. So, maybe we should hide them behind a flag, or create many small libraries (distributed-dataset-json, distributed-dataset-gzip e.g).

Relevant: #8