twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
993 stars 184 forks source link

Method not found when creating time series add #53

Closed geoHeil closed 5 years ago

geoHeil commented 5 years ago

val tsRdd = TimeSeriesRDD.fromDF(dataFrame = df)(isSorted = true, timeUnit = MILLISECONDS)

throws

scala> val tsRdd = TimeSeriesRDD.fromDF(dataFrame = cellFeed)(isSorted = true, timeUnit = MILLISECONDS)
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution$.apply$default$2()Lscala/Option;
  at com.twosigma.flint.timeseries.TimeSeriesStore$.isClustered(TimeSeriesStore.scala:149)
  at com.twosigma.flint.timeseries.TimeSeriesStore$.apply(TimeSeriesStore.scala:64)
  at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDFWithPartInfo(TimeSeriesRDD.scala:509)
  at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDF(TimeSeriesRDD.scala:304)
  ... 52 elided

on spark 2.2 when trying to create the initial RDD.

Minimal reproducible sample:

import spark.implicits._
  import com.twosigma.flint.timeseries.TimeSeriesRDD
  import scala.concurrent.duration._
  val df = Seq((1, 1, 1L), (2, 3, 1L), (1, 4, 2L), (2, 2, 2L)).toDF("id", "value", "time")
  val tsRdd = TimeSeriesRDD.fromDF(dataFrame = df)(isSorted = true, timeUnit = MILLISECONDS)

on spark 2.2 via HDP 2.6.4

geoHeil commented 5 years ago

Regular spark 2.3.2 works fine, regular spark 2.2.1 also fails with above exception.

geoHeil commented 5 years ago

When creating a custom build like https://github.com/geoHeil/flint/tree/flint-spark-2.2 the basic example works again with spark 2.2

However, there are 2 unit test failures


NOTE: csv test failure is also is on regular master branch
- partition preserving something failed because a list was empty and `head` was called 
icexelloss commented 5 years ago

Hi!

The current flint version only supports Spark 2.3. On Sun, Oct 14, 2018 at 7:54 AM geoHeil notifications@github.com wrote:

When creating a custom build like https://github.com/geoHeil/flint/tree/flint-spark-2.2 the basic example works again with spark 2.2

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/53#issuecomment-429620214, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrBao_wLlK5O8IK28fqQZ26J2RywOks5ukyXjgaJpZM4XasUA .