twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
993 stars 184 forks source link

How to make Flint work with Spark 3.0 #83

Open yitao-li opened 4 years ago

yitao-li commented 4 years ago

Hi n00b question: given how awesome and popular Flint has been, I'm really interested in making it work with Spark 3.0.

So I went ahead and tried the changes in https://github.com/twosigma/flint/pull/82/files which made Flint build successfully with Spark 3.0-preview2, but now some tests are failing (see flint-spark-3.0.0-build.log ), and I'm not completely sure how to fix the test failures or how much work it might take to fix them.

Any idea how to resolve the test failures? Thanks in advance!

yitao-li commented 4 years ago

Update: just realized most (if not all) of the test failures are due to org.apache.spark.sql.execution.ColumnarToRowExec not being considered in isPartitionPreservingUnaryNode. I think same for RowToColumnarExec as well.

dgrnbrg commented 3 years ago

I'm interesting in this as well. Please post other things you find.

yitao-li commented 3 years ago

@dgrnbrg Hey at the moment I have a branch named 'flint-spark-3.0' in my unofficial fork of Flint that has all necessary changes I'm aware of that will make Flint work with Spark 3.0.

You can view the source code in https://github.com/yitao-li/flint/tree/flint-spark-3.0 and see commit history to inspect changes I made on top of the original Flint source code.

As far as I know all the summarizer functionalities and ASOF join functionalities are working as expected with Spark 3.0 after my changes (in other words, it looks like all changes I have made so far were just to make compiler happy when building with Spark 3.0 and there was no functional change).