utdemir / distributed-dataset

A distributed data processing framework in Haskell.
BSD 3-Clause "New" or "Revised" License
114 stars 5 forks source link

SQL execution #20

Open utdemir opened 5 years ago

utdemir commented 5 years ago

This is one of the more exciting features.

Apache Spark has support for running SQL queries at runtime in an untyped fashion. It is quite useful when exploring the data or for ad-hoc queries. See: https://spark.apache.org/docs/latest/sql-programming-guide.html

We should be able to implement a function like runSQL :: String -> Dataset Row -> Dataset Row where Row is an untyped data structure that can represent arbitrary products like aeson's Value.

If we implement this in distributed-dataset, with some modifications we might even be able to use ghci or IHaskell to run queries.