Open mattomatic opened 5 years ago
We have not tried it with Spark 2.4 yet. On Wed, Feb 13, 2019 at 3:44 PM mattomatic notifications@github.com wrote:
Does this library currently work with spark 2.4?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/63, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrHasxpH0N4PydNjQOOvfgBQU4YFJks5vNHkagaJpZM4a6PJ7 .
It does not work with 2.3.2 as well
what issues do you see with 2.3.2? Internally we use flint with 2.3.2 without issues. On Fri, Feb 15, 2019 at 1:53 AM Sandro Cavallari notifications@github.com wrote:
It does not work wit 2.3.2 as well
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/63#issuecomment-463929000, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrIZKDz7cGE8o1d7di37jFP3jI6yCks5vNlmFgaJpZM4a6PJ7 .
https://github.com/twosigma/flint/pull/64
Managed these changes to get it to build under spark 2.4
I am running on spark-2.4.0-bin-hadoop2.7 and seeing the below error. Any idea ?
from ts.flint import windows sp500_previous_day_return = sp500_return.shiftTime(windows.future_absolute_time('1day')).toDF('time', 'previous_day_return') Traceback (most recent call last): File "
", line 1, in File "/Users/kkum25/anaconda/envs/featuretool/lib/python3.7/site-packages/ts/flint/dataframe.py", line 1591, in shiftTime tsrdd = self.timeSeriesRDD.shift(window._jwindow(self._sc)) File "/Users/kkum25/anaconda/envs/featuretool/lib/python3.7/site-packages/ts/flint/dataframe.py", line 154, in timeSeriesRDD self._jdf, self._is_sorted, self._junit, self._time_column) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o124.fromDF. : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution$.apply$default$2()Lscala/Option; at com.twosigma.flint.timeseries.TimeSeriesStore$.isClustered(TimeSeriesStore.scala:149) at com.twosigma.flint.timeseries.TimeSeriesStore$.apply(TimeSeriesStore.scala:64) at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDFWithPartInfo(TimeSeriesRDD.scala:509) at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDF(TimeSeriesRDD.scala:304) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)
64
Managed these changes to get it to build under spark 2.4
before this change what issue you were seeing ?
Is there any blockers to merging this in? I'd like to use Flint on Databricks but I don't see any compatible versions of Spark being offered (2.2.1 or 2.4.[0,1,2] are only versions of Spark currently available).
I think this is a great project and would love to help mature it!
I have been successfully using @mattomatic's changes to run on Spark 2.4.
@icexelloss Does this library work with spark 2.2.x version?
Does this library currently work with spark 2.4?