mdrakiburrahman / rakirahman.me

💻Personal blog powered by Gatsby
https://www.rakirahman.me
MIT License
1 stars 1 forks source link

Azure Schema Registry with Spark 2.4 #3

Closed ramvalasa closed 3 years ago

ramvalasa commented 3 years ago

Hi Raki Rahman,

Your blog on Exploring Azure Schema Registry with Spark 3 is very adequate and impressive. I am sure, your blog will help Spark Azure integrations easy. Just wondering if you were able to connect Azure Schema Registry using the Spark 2.4.x api's instead of Spark 3.x?

Thanks Ram Valasa

mdrakiburrahman commented 3 years ago

Hi @ramvalasa, thanks for reaching out.

The integration using the particular library I used in the article has less to do with the Spark version, and more to do with the Scala runtime.

If you look at the pom.xml for the library JAR, from line 21:

<scala.version>2.12.6</scala.version>

The highest version of Databricks Runtime that has Spark 2.4 is DBR 6.4, which has Scala 2.11.12. Although Open-Source Spark 2.4.0 has experimental Scala 2.12 support, I'm not sure where I can find a runtime environment that has both Spark 2.4 and Scala 2.12 installed (i.e. you won't find it in Databricks and probably other Cloud Spark offerings either - you'd build a Spark cluster yourself from the ground up with this unusual setup).

That being said, I'm sure the library author would be able to refactor the existing library to leverage Scala 2.11 instead - but this implementation probably won't be used very much since most folks are using Scala 2.12 anyways (also I'm no Scala expert, not sure if Scala 2.12 code is backwards compatible or not).

So basically, to test this in Spark 2.4: