swoop-inc / spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
https://swoop-inc.github.io/spark-alchemy/
Apache License 2.0
187 stars 34 forks source link
data-engineering data-science scala spark

spark-alchemy

Spark Alchemy is a collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive in our demanding petabyte-scale environment with rich data (thousands of columns).

Supported languages

While spark-alchemy, like Spark itself, is written in Scala, much of its functionality, such as interoperable HyperLogLog functions, can be used from other Spark-supported languages such as SparkSQL and Python.

Installation

Add the following to your libraryDependencies in SBT:

libraryDependencies += "com.swoop" %% "spark-alchemy" % "1.0.1"

You can find all released versions here.

Some use cases such as interoperability with PySpark may require the assembly of a fat JAR of spark-alchemy. To assemble, run sbt assembly. To skip tests during assembly, run sbt 'set sbt.Keys.test in assembly := {}' assembly instead.

For Spark users

For Spark framework developers

For Python developers

What we hope to open source in the future, if we have the bandwidth

Development

Build docs microsite

sbt "project docs" makeMicrosite

Run docs microsite locally (run under docs/target/site folder)

jekyll serve -b /spark-alchemy

More details

More from Swoop

Community & contributing

Contributions and feedback of any kind are welcome. Please, create an issue and/or pull request.

Spark Alchemy is maintained by the team at Swoop. If you'd like to contribute to our open-source efforts, by joining our team or from your company, let us know at spark-interest at swoop dot com.

License

spark-alchemy is Copyright © 2018-2020 Swoop, Inc. It is free software, and may be redistributed under the terms of the LICENSE.