kaiko-ai / typedspark

Column-wise type annotations for pyspark DataFrames
Apache License 2.0
65 stars 4 forks source link

Add cache support #376 #377

Closed jonmclean closed 5 months ago

jonmclean commented 6 months ago

This pull request adds support for cache operations to the DataSet object.

Fixes #376

jonmclean commented 6 months ago

If possible I would like to backport this to a version that supports Spark 3.4 as well. My current project is on Spark 3.4 because the infrastructure it runs on does not support 3.5 yet.

The code changes in this PR should work on Spark 3.4 and prior versions without modifications.

I would appreciate advice on how the project maintainers would prefer the backport to be done.

nanne-aben commented 5 months ago

Thanks @jonmcleaJo, this is a really useful contribution!

Wrt backporting, it should already work with spark 3.4 out of the box. We don't really explicitly test older versions of spark in the cicd because we don't expect any trouble there. Of course, if you do see a reason why we'd need to do so, we can easily add that. Let me know what you think!

Happy to merge this btw! Lemme know if you wanna change anything wrt backporting or if we can go ahead and merge.

jonmclean commented 5 months ago

Let's go ahead and merge. I reviewed the changes over the last few releases and there does not appear to be anything specific to Spark 3.5.

nanne-aben commented 5 months ago

Thank you for your contribution!