hablapps / doric

Type safety for spark columns
https://www.hablapps.com/doric/
Apache License 2.0
76 stars 11 forks source link
big big-data dataframe scala spark spark-columns typesafe

doric

Type-safe columns for spark DataFrames!

GitHub release (latest by date) GitHub Release Date

CI pages-build-deployment Release Scala Steward badge Binder

Spark Maven Central Codecov
2.4.x Deprecated Maven Central Codecov
3.0.x Maven Central Codecov
3.1.x Maven Central Codecov
3.2.x Maven Central Codecov
3.3.x Maven Central Codecov
3.4.x Maven Central Codecov

Doric offers type-safety in DataFrame column expressions at a minimum cost, without compromising performance. In particular, doric allows you to:

You'll get all these goodies:

User guide

Please, check out this notebook for examples of use and rationale (also available through the binder link).

You can also check our documentation page

Installation

Fetch the JAR from Maven:

Sbt

libraryDependencies += "org.hablapps" %% "doric_3-2" % "0.0.7"

Maven

<dependency>
    <groupId>org.hablapps</groupId>
    <artifactId>doric_3-2_2.12</artifactId>
    <version>0.0.7</version>
</dependency>

Doric depends on Spark internals, and it's been tested against the following spark versions.

Spark Scala Tested doric
2.4.x 2.11 Deprecated Maven Central
3.0.0 2.12 -
3.0.1 2.12 -
3.0.2 2.12 Maven Central
3.1.0 2.12 -
3.1.1 2.12 -
3.1.2 2.12 -
3.1.3 2.12 Maven Central
3.2.0 2.12 -
3.2.1 2.12 -
3.2.2 2.12 / 2.13 Maven Central
3.3.0 2.12 / 2.13 -
3.3.1 2.12 / 2.13 -
3.3.2 2.12 / 2.13 -
3.3.3 2.12 / 2.13 -
3.3.4 2.12 / 2.13 Maven Central
3.4.0 2.12 / 2.13 -
3.4.1 2.12 / 2.13 -
3.4.2 2.12 / 2.13 Maven Central
3.5.0 2.12 / 2.13 -
3.5.1 2.12 / 2.13 Maven Central

Contributing

Doric is intended to offer a type-safe version of the whole Spark Column API. Please, check the list of open issues and help us to achieve that goal!

Please read the contribution guide 📋