twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
993 stars 184 forks source link

pyspark 2.4 support #63

Open mattomatic opened 5 years ago

mattomatic commented 5 years ago

Does this library currently work with spark 2.4?

icexelloss commented 5 years ago

We have not tried it with Spark 2.4 yet. On Wed, Feb 13, 2019 at 3:44 PM mattomatic notifications@github.com wrote:

Does this library currently work with spark 2.4?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/63, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrHasxpH0N4PydNjQOOvfgBQU4YFJks5vNHkagaJpZM4a6PJ7 .

andompesta commented 5 years ago

It does not work with 2.3.2 as well

icexelloss commented 5 years ago

what issues do you see with 2.3.2? Internally we use flint with 2.3.2 without issues. On Fri, Feb 15, 2019 at 1:53 AM Sandro Cavallari notifications@github.com wrote:

It does not work wit 2.3.2 as well

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/63#issuecomment-463929000, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrIZKDz7cGE8o1d7di37jFP3jI6yCks5vNlmFgaJpZM4a6PJ7 .

mattomatic commented 5 years ago

https://github.com/twosigma/flint/pull/64

Managed these changes to get it to build under spark 2.4

kunalsingh09 commented 5 years ago

I am running on spark-2.4.0-bin-hadoop2.7 and seeing the below error. Any idea ?

from ts.flint import windows sp500_previous_day_return = sp500_return.shiftTime(windows.future_absolute_time('1day')).toDF('time', 'previous_day_return') Traceback (most recent call last): File "", line 1, in File "/Users/kkum25/anaconda/envs/featuretool/lib/python3.7/site-packages/ts/flint/dataframe.py", line 1591, in shiftTime tsrdd = self.timeSeriesRDD.shift(window._jwindow(self._sc)) File "/Users/kkum25/anaconda/envs/featuretool/lib/python3.7/site-packages/ts/flint/dataframe.py", line 154, in timeSeriesRDD self._jdf, self._is_sorted, self._junit, self._time_column) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o124.fromDF. : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution$.apply$default$2()Lscala/Option; at com.twosigma.flint.timeseries.TimeSeriesStore$.isClustered(TimeSeriesStore.scala:149) at com.twosigma.flint.timeseries.TimeSeriesStore$.apply(TimeSeriesStore.scala:64) at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDFWithPartInfo(TimeSeriesRDD.scala:509) at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDF(TimeSeriesRDD.scala:304) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)

kunalsingh09 commented 5 years ago

64

Managed these changes to get it to build under spark 2.4

before this change what issue you were seeing ?

luzlab commented 4 years ago

Is there any blockers to merging this in? I'd like to use Flint on Databricks but I don't see any compatible versions of Spark being offered (2.2.1 or 2.4.[0,1,2] are only versions of Spark currently available).

I think this is a great project and would love to help mature it!

dgrnbrg commented 4 years ago

I have been successfully using @mattomatic's changes to run on Spark 2.4.

prabhash17 commented 4 years ago

@icexelloss Does this library work with spark 2.2.x version?