twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
993 stars 184 forks source link

NoSuchMethodError: internalCreateDataFrame #55

Closed emmanuelmillionaer closed 5 years ago

emmanuelmillionaer commented 5 years ago

Thank you for this amazing library! 🥇 I'm running Spark 2.2.0 and tried to initialize a clock:

clock = clocks.uniform(sqlContext, frequency="1day")

This threw an exception:

py4j.protocol.Py4JJavaError: An error occurred while calling z:com.twosigma.flint.timeseries.Clocks.uniform.
: java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession.internalCreateDataFrame$default$3()Z
    at org.apache.spark.sql.DFConverter$.toDataFrame(DFConverter.scala:42)
    at com.twosigma.flint.timeseries.clock.Clock.asTimeSeriesRDD(Clock.scala:148)
    at com.twosigma.flint.timeseries.Clocks$.uniform(Clocks.scala:54)
    at com.twosigma.flint.timeseries.Clocks.uniform(Clocks.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

NoSuchMethodError: org.apache.spark.sql.SparkSession.internalCreateDataFrame$default$3()Z

I found this method in the docs: https://spark.apache.org/docs/preview/api/java/org/apache/spark/sql/SparkSession.html#internalCreateDataFrame(org.apache.spark.rdd.RDD,%20org.apache.spark.sql.types.StructType)

However it seems like it's not available in the version I'm running? Can you please provide me with a hint on how to resolve this issue? Thanks! 👍

LeoDashTM commented 5 years ago

@emmanuelmillionaer hello there! I'm glad someone else is trying to use this library. Do you know if there's an active forum anywhere, where flint users can ask questions, share answers and overall exchange their experiences with the library?

Regarding your example/error: please, read the docs, this works for me, take a look (you're missing the flint context object in place of the sql context, I'm guessing the start and the end timepoints are defaulted to min and max respectively):

from ts.flint import FlintContext, clocks
fc = FlintContext( sqlContext )
cl = clocks.uniform(fc, '30s', begin_date_time='2018-8-1 5:55:35', end_date_time='2018-08-01 05:59:05')
print( cl )
print( type( cl ) )

from ts.flint import _version
_version.get_versions()

FYI, this is running on DataBricks (Spark 2.3.1)

Now, I'm not able to join this clock (which I'm assuming is a DataFrame) to another existing DataFrame of mine - the error is saying joinLeft is not available. Any help with that is appreciated.

LeoDashTM commented 5 years ago

Forgot the output, here it is:

TimeSeriesDataFrame[time: timestamp]
<class 'ts.flint.dataframe.TimeSeriesDataFrame'>
Out[9]: 
{'version': '0.6.0',
 'full-revisionid': '2e56267f357f89a15f2bdf5ba9af83dc34afe75e',
 'error': None,
 'date': '2018-07-25T19:19:11+0000',
 'dirty': False}
icexelloss commented 5 years ago

@emmanuelmillionaer Hi currently Flint only works with Spark 2.3+

icexelloss commented 5 years ago

@LeoDashTM I suspect clock table is not on the left side? Please create a another issue so we don't talk about two different issues here.