typelevel / frameless

Expressive types for Spark.
Apache License 2.0
879 stars 138 forks source link

Type Spark’s Structured Streaming #232

Open OlivierBlanvillain opened 6 years ago

OlivierBlanvillain commented 6 years ago

We are currently missing these two Dataset method:

That require some understanding of Spark streaming to be properly typed and tested. Here is the relevant documentation if anyone is interested and getting started on that:

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html https://databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html

etspaceman commented 5 years ago

+1 - This was a big blocker for us adopting Frameless, as most of our jobs are structured streaming jobs.

kyprifog commented 5 years ago

I'm curious why this never took off, my guess is that most typelevel people are using fs2 instead of spark streaming, but its still limited in that it can't out of the box do distributed streaming. Maybe typelevel people are using flink instead but seems doubtful from how flink is engineered.

This article is interesting, has anyone tried to extend this approach into the fs2/frameless world?

http://mandubian.com/2014/02/13/zpark/