saurfang / spark-sas7bdat

Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL
http://spark-packages.org/package/saurfang/spark-sas7bdat
Apache License 2.0
89 stars 40 forks source link

support spark 3.0 #59

Closed thesuperzapper closed 4 years ago

thesuperzapper commented 4 years ago

This PR implements support for Spark 3.0 (Scala 2.12)

To get this merged we will need to:

thesuperzapper commented 4 years ago

All sbt test pass except com.github.saurfang.sas.spark.DefaultSourceSuite which fails because it thinks path is not being passed to com.github.saurfang.sas.spark.DefaultSource.checkPath( by the CREATE TABLE ... step.

Feel free to tinker around with it if anyone wants.

thesuperzapper commented 4 years ago

Can someone check if the jar compiled for Spark 3.0.0 is usable with Spark 2.4.X, (if you compile for Scala 2.11obviously)

NOTE: run sbt +assembly to compile for multiple versions

thesuperzapper commented 4 years ago

Actually that will fail, because Maven has no 2.11 jars for Spark 3.0.0.

I think we should consider dropping support for Spark 2.X with version 3.X of this library.

(And just tell people to use 2.X if they use Spark 2.X)

thesuperzapper commented 4 years ago

@saurfang thoughts about dropping Spark 2.X?

Tagar commented 4 years ago

my +1 cent on dropping Spark 2.x

Scala 2.12 support in Spark 2.4 was experimental as per release notes https://spark.apache.org/releases/spark-release-2-4-0.html Databricks for example never released a runtime with Spark 2.x and Scala 2.12.

@srowen can comment here better.

If somebody needs to use spark-sas7bdat with Spark 2.x, they can just reference the current version. The new version would then be for Spark 3.x

Thanks!

srowen commented 4 years ago

So, I think ideally you simply set up the build matrix to test Scala 2.11 + Spark 2.4, and Scala 2.12 + Spark 3. That is trivial to declare with Travis; I can show you examples.

If this is a library that doesn't change much, it's coherent to say users of Spark 2 can use current versions and users of Spark 3 can use future versions and just deal with it. However I think the code cross-complies readily.

What I would suggest dropping now is support for Scala 2.10 and Spark 2.3 or earlier. No real point going forward.

Tagar commented 4 years ago

+1 to what Sean said - this looks like a much better option

saurfang commented 4 years ago

+1 on setting build matrix for 2.4 and 3.0 if we can fix tests to work for both. Otherwise I'm peaceful dropping support for 2.x since I don't think we have any plans for major features anyway.

pkolli-caredx commented 4 years ago

@saurfang Could you pls build the JAR ? I don't have permission to build the JAR

srowen commented 4 years ago

@pkolli-caredx what do you mean you don't have permission? You can pull this branch and build it. If it's hard I could do it for you but should be that simple.

pkolli-caredx commented 4 years ago

@srowen Could you pls build the JAR and share it, that would be great

srowen commented 4 years ago

@pkolli-caredx try this assembly JAR: https://drive.google.com/file/d/1xj3XYjdzmNTtqnra96pPIdgYcdXVpyBd/view?usp=sharing

thesuperzapper commented 4 years ago

@saurfang got this working with a proper test matrix in travis, just need you to:

  1. merge this
  2. release 3.0.0 and update NEWS.md
    • there will be two .jar now, one for Spark3/Scala12 and one for Spark2/Scala11
saurfang commented 4 years ago

much appreciate it as always @thesuperzapper I will get the release going this weekend