univalence / zio-spark

A functional wrapper around Spark to make it works with ZIO
https://univalence.github.io/zio-spark/
Apache License 2.0
41 stars 9 forks source link

Support for latest version of Spark #416

Open alcpinto opened 6 months ago

alcpinto commented 6 months ago

When using zio-spark for writing spark Structured Streaming jobs, I noticed that some APIs are missing in zio-spark DataStreamWriter (e.g.: foreachBatch API is missing).

I can use a workaround to solve this problem. From a zio-spark DataFrame/Dataset I can access the underlying DataFrame/Dataset and use the normal DataStreamWriter provided by Spark. However, ideally having those methods available in zio-spark would make the code cleaner. (see an example below)

...
writer  <- ZIO.attempt {
                    df.underlying.writeStream
                        .foreachBatch { (batch: UnderlyingDataset[EntityTxEdgeKinesisTransform], batchId: Long) =>
                                 val cached    = batch.cache()
                                 log.info(s"Processing ${cached.count()} records in batch $batchId")
                                 ...
                         }
                         .option("checkpointLocation", "/my/checkpointing/location")
                         .outputMode(OutputMode.Update())
                         .queryName("my_query")
                 }
                 .orElseFail(BuildDataStreamWriteError("Error building a DataStreamWriter with a foreach batch"))  
withTrigger = writer.trigger(Trigger.ProcessingTime(500))
qry         = withTrigger.start
...

I wonder if generating the zio-spark code against the latest Spark version would solve this issue.

ahoy-jon commented 6 months ago

👋 Updated a lot of the versions in this commit https://github.com/univalence/zio-spark/commit/0ec5549602671c25a6292514ca7ddddd2b18e991

I cannot find the method "writeStream", maybe it needs some work to create a wrapper or some mappings.

alcpinto commented 6 months ago

Thank you for this quick update! Great...

alcpinto commented 6 months ago

Thanks for making this changes available in master.

Since these changes were already merged into master, is it possible to publish a jar for each of the supported Scala versions (2.12, 2.13, 3)? There's only a published jar for 2.13.

Thanks!

ahoy-jon commented 6 months ago

True, to me, it's not deployed at all (we need to update the keeps for maven central)

image

Are you using the release on github?

ahoy-jon commented 6 months ago

@alcpinto it's here : https://github.com/univalence/mavenRepository I will have to put that on a real solution very soon.

alcpinto commented 6 months ago

@ahoy-jon Thanks so much for looking into this.