rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Add support for different platform versions... #47

Open luckyasser opened 7 years ago

luckyasser commented 7 years ago

As of now rheem supports one version of a specific platform per rheem distribution. Though it might be of little value and high complexity to support multiple running versions of a platform per a single JVM (due runtime conflicts, etc...), it is still useful to support(and maintain) different versions of execution platforms. For examples, not all users can immediately migrate their clusters to Spark 2.0.

This issue is to discuss how we should approach platform versioning in general. One approach is to make the Platform(or the Plugin?) class, aware of the version of its underlying backend, and allow the user to specify the version of the platform at the application level. This approach has the following advantages:

Second approach is, using some maven tricks, specifying the version of the platform in the pom file, will include the correct modules in the build that relate to that platform, which is probably a lot faster to implement.

There's currently a branch for Rheem compiled for spark 2.1.0:

https://github.com/rheem-ecosystem/rheem/tree/Rheem-Spark2.0

sekruse commented 7 years ago

Spark, for instance, handles different Hadoop versions at the build level.