tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 391 forks source link

Does this connector work with TF 2.x? #177

Open dgoldenberg-audiomack opened 3 years ago

dgoldenberg-audiomack commented 3 years ago

The latest TF right now is 2.4.0. The latest connector on maven central is 1.15.0 published on Oct 23, 2019.

If I build the connector with the instructions from here: https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector, will it work?

Echo9573 commented 3 years ago

Hi @dgoldenberg-audiomack, when I build the connector in TF 2.0.0, I got fails in the build test stage ( link ), how about you?

dgoldenberg-audiomack commented 3 years ago

Hey @Echo9573, I think your error is:

/bin/sh: 1: java: not found

judging by the output. However you can try running Maven with -e and -X to get more info. It seems like either you don't have Java installed or it's not on your PATH.

dgoldenberg-audiomack commented 3 years ago

I think the connector works; would be great if committers verified and added a blurb to the docs.

jukujala commented 3 years ago

I tried to follow the instructions with TF 2.2 and Spark 3.0.1, but installing ecosystem/hadoop has a missing dependency to org.tensorflow:proto:jar:2.2.0. The error message: Could not resolve dependencies for project org.tensorflow:tensorflow-hadoop:jar:2.2.0: Could not find artifact org.tensorflow:proto:jar:2.2.0 in central (https://repo.maven.apache.org/maven2)

Instructions pass if I manually set org.tensorflow:proto to TF version 1.15.0. However, I'm unsure what is the impact of using an old version of org.tensorflow:proto.

For the build I used dataproc preview-debian10 master image and these commands:

cd ../../hadoop
mvn versions:set -DnewVersion=2.2.0
mvn clean install
cd ../spark/spark-tensorflow-connector
mvn versions:set -DnewVersion=2.2.0
mvn clean install -Dspark.version=3.0.1