Open jenniew opened 3 years ago
@ManfeiBai please help to fix the issues.
ok, I would fix it now
This following error has been fixed in the new PR(https://github.com/intel-analytics/analytics-zoo/pull/4617), and the path has been updated, the other errors are processing "And when run ./ppml/scripts/generate-keys.sh, it get error: base64: ./keys/keystore.jks: No such file or directory base64: ./keys/keystore.pkcs12: No such file or directory base64: ./keys/server.pem: No such file or directory base64: ./keys/server.crt: No such file or directory base64: ./keys/server.csr: No such file or directory base64: ./keys/server.key: No such file or directory."
When run "bash work/start-scripts/start-spark-local-sql-sgx.sh", get this error: py4j.protocol.Py4JJavaError: An error occurred while calling o32.json. : org.apache.spark.sql.AnalysisException: Path does not exist: file:/examples/src/main/resources/people.json; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:392) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 211, in
When run "bash work/start-scripts/start-spark-local-sql-sgx.sh", also get this error:
21/09/01 22:24:33 INFO DAGScheduler: Job 13 failed: runJob at PythonRDD.scala:153, took 453.136422 s
Traceback (most recent call last):
File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 212, in
When run "bash work/start-scripts/start-spark-local-sql-sgx.sh", get this error: py4j.protocol.Py4JJavaError: An error occurred while calling o32.json. : org.apache.spark.sql.AnalysisException: Path does not exist: file:/examples/src/main/resources/people.json; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:392) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 211, in basic_df_example(spark) File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 42, in basic_df_example df = spark.read.json("examples/src/main/resources/people.json") File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 274, in json File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line
Path does not exist... please correct the file path
When run "bash work/start-scripts/start-spark-local-sql-sgx.sh", also get this error: 21/09/01 22:24:33 INFO DAGScheduler: Job 13 failed: runJob at PythonRDD.scala:153, took 453.136422 s Traceback (most recent call last): File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 212, in schema_inference_example(spark) File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 152, in schema_inference_example schemaPeople = spark.createDataFrame(people) File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 746, in createDataFrame File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 390, in _createFromRDD File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 361, in _inferSchema File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1390, in first File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1372, in take File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/context.py", line 1069, in runJob File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18.0 (TID 209, localhost, executor driver): java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:134) at java.io.DataOutputStream.writeInt(DataOutputStream.java:198) at org.apache.spark.security.SocketAuthHelper.writeUtf8(SocketAuthHelper.scala:112) at org.apache.spark.security.SocketAuthHelper.authToServer(SocketAuthHelper.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:115) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:133) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:125) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117) at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
please fix the previous one? then try it again and did you pass the java -version test on SGX?
When run "bash work/start-scripts/start-spark-local-sql-sgx.sh", also get this error: 21/09/01 22:24:33 INFO DAGScheduler: Job 13 failed: runJob at PythonRDD.scala:153, took 453.136422 s Traceback (most recent call last): File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 212, in schema_inference_example(spark) File "/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py", line 152, in schema_inference_example schemaPeople = spark.createDataFrame(people) File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 746, in createDataFrame File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 390, in _createFromRDD File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/session.py", line 361, in _inferSchema File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1390, in first File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1372, in take File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/context.py", line 1069, in runJob File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/ppml/trusted-big-data-ml/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18.0 (TID 209, localhost, executor driver): java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:134) at java.io.DataOutputStream.writeInt(DataOutputStream.java:198) at org.apache.spark.security.SocketAuthHelper.writeUtf8(SocketAuthHelper.scala:112) at org.apache.spark.security.SocketAuthHelper.authToServer(SocketAuthHelper.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:115) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:133) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:125) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117) at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
please fix the previous one? then try it again and did you pass the java -version test on SGX?
I fixed the previous path issue manually, and tried the this script then got this error. The first test "basic_df_example" passed, the second test "schema_inference_example" failed with this error. Yes, java -version test passed.
This error has been fixed in the new PR( https://github.com/intel-analytics/analytics-zoo/pull/4627 ) by interaction and prompt information: "2.3.1. Run ./build-docker-image.sh, if no proxy setting needed, the script would be fail. Should describe how to run if no proxy needed."
This error has been fixed in the new PR( https://github.com/intel-analytics/analytics-zoo/pull/4627 ) and new doc: "2.3.2.1. cp -r ../keys . this command cannot be executed as the directory is not correct. should be cp -r ../../../../keys ."
This error has been fixed in the new PR( https://github.com/intel-analytics/analytics-zoo/pull/4627 ) and new doc: "To start the container, first modify the paths in deploy-local-spark-sgx.sh. No information about what should be set for the environment: export ENCLAVE_KEY_PATH=YOUR_LOCAL_ENCLAVE_KEY_PATH export DATA_PATH=YOUR_LOCAL_DATA_PATH export KEYS_PATH=YOUR_LOCAL_KEYS_PATH export LOCAL_IP=YOUR_LOCAL_IP"
In the newest analytics-zoo version, the images's version would be under "intelanalytics/", rather than "10.239.45.10/": When run ./deploy-local-spark-sgx.sh would be failed. Get error: Unable to find image '10.239.45.10/arda/intelanalytics/analytics-zoo-ppml-trusted-big-data-ml-python-graphene:0.11-SNAPSHOT' locally
The other errors are processing.
New fix of "keytool: command not found" has been added in the PR intel-analytics/analytics-zoo#4627, please follow this doc to do the Prerequisite and create "keys" and "password": https://github.com/ManfeiBai/analytics-zoo/blob/patch-12/docs/readthedocs/source/doc/PPML/Overview/ppml.md#21-prerequisite
When run ./deploy-distributed-standalone-spark.sh, it uses root user. But actually on Azure VM, no root user can be used. Can we provide a deploy script which uses non-root sudo user?
deploy-distributed-standalone-spark.sh
Could we use "sudo deploy-distributed-standalone-spark.sh" to run the script on Azure VM, rather than use "./deploy-distributed-standalone-spark.sh"?
distributed-check-status.sh also need to support non-root user
When run work load on cluster, we need to replace --master 'local[4]' with such lines:
--master 'spark://your_master_url' \
--conf spark.authenticate=true \
--conf spark.authenticate.secret=your_secret_key \
What should your_secret_key
to be set to?
Thanks, solving it now
PPML user guide is not clear and lack of some essential information. And also there are errors for the scripts. For example, For 2.1 it runs ./ppml/scripts/generate-keys.sh, it will run keytool. But no information mentioned about how to install and run keytool command. And when run ./ppml/scripts/generate-keys.sh, it asks for input several passwords, but not sure what are these password for, which passwords should be match. And when run ./ppml/scripts/generate-keys.sh, it get error: base64: ./keys/keystore.jks: No such file or directory base64: ./keys/keystore.pkcs12: No such file or directory base64: ./keys/server.pem: No such file or directory base64: ./keys/server.crt: No such file or directory base64: ./keys/server.csr: No such file or directory base64: ./keys/server.key: No such file or directory.
2.3.1. Run ./build-docker-image.sh, if no proxy setting needed, the script would be fail. Should describe how to run if no proxy needed.
2.3.2.1. cp -r ../keys . this command cannot be executed as the directory is not correct. should be cp -r ../../../../keys .
To start the container, first modify the paths in deploy-local-spark-sgx.sh. No information about what should be set for the environment: export ENCLAVE_KEY_PATH=YOUR_LOCAL_ENCLAVE_KEY_PATH export DATA_PATH=YOUR_LOCAL_DATA_PATH export KEYS_PATH=YOUR_LOCAL_KEYS_PATH export LOCAL_IP=YOUR_LOCAL_IP
When run ./deploy-local-spark-sgx.sh would be failed. Get error: Unable to find image '10.239.45.10/arda/intelanalytics/analytics-zoo-ppml-trusted-big-data-ml-python-graphene:0.11-SNAPSHOT' locally
Haven't finish all steps. but so many unclear things and errors. Hard to follow. Please check all the steps and fix.