Open sgwhat opened 2 years ago
Could you please take a look? @qiuxin2012
This error doesn't happen when I set cluster_mode
to spark-submit
and run with spark-submit command instead.
I find some error message:
2022-05-17 15:28:15 ERROR ApplicationMaster:91 - User class threw exception: java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=2, No such file or directory
java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:100)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 7 more
2022-05-17 15:29:23 INFO ApplicationMaster:54 - Waiting for spark context initialization...
2022-05-17 15:29:23 ERROR ApplicationMaster:91 - User class threw exception: java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=13, Permission denied
java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:100)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.
I find some error message:
2022-05-17 15:28:15 ERROR ApplicationMaster:91 - User class threw exception: java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=2, No such file or directory java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:100) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 7 more
2022-05-17 15:29:23 INFO ApplicationMaster:54 - Waiting for spark context initialization... 2022-05-17 15:29:23 ERROR ApplicationMaster:91 - User class threw exception: java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=13, Permission denied java.io.IOException: Cannot run program "/home/manfei/anaconda3/envs/master/bin/python": error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:100) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684) Caused by: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 7 more
This error is caused by a wrong environment PYSPARK_DRIVER_PYTHON
.
When I set
cluster_mode
toyarn-cluster
ininit_orca_context( )
and run as a python script, it fails with the following info:WARN ScriptBasedMapping:254 - Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 172.16.0.173 ExitCodeException exitCode=1: Fatal Python error: _PyMainInterpreterConfig_Read: memory allocation failed ValueError: character U+6374652f is not in range [U+0000; U+10ffff] Current thread 0x00007f1287deb740 (most recent call first): at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) at org.apache.hadoop.util.Shell.run(Shell.java:479) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251) at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188) at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81) at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:37) at org.apache.spark.deploy.yarn.YarnAllocator$$anon$1$$anonfun$run$1.apply(YarnAllocator.scala:422) at org.apache.spark.deploy.yarn.YarnAllocator$$anon$1$$anonfun$run$1.apply(YarnAllocator.scala:421) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.deploy.yarn.YarnAllocator$$anon$1.run(YarnAllocator.scala:421)
This is a warning, failed to get the rack info of the node. It's not a blocking error, the job is still running.
This error is caused by a wrong environment PYSPARK_DRIVER_PYTHON.
PYSPARK_DRIVER_PYTHON
should be ignored when use yarn-cluster mode.
When I set
cluster_mode
toyarn-cluster
ininit_orca_context( )
and run as a python script, it fails with the following info: