Open chongjeeseng opened 4 years ago
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
@chongjeeseng based on a quick online search, this may be a jupyter issue where SPARK_HOME is not set: https://github.com/jupyter/jupyter/issues/248 I haven't seen this issue before however, I will leave it open for now in case others see it as well
@chongjeeseng based on a quick online search, this may be a jupyter issue where SPARK_HOME is not set: jupyter/jupyter#248 I haven't seen this issue before however, I will leave it open for now in case others see it as well
hi @imatiach-msft thank you for replying, I have followed the link to the Jupyter issue you posted, but unfortunately it seems that the SPARK_HOME environment variable is set correctly.
Inputting the %env magic function into Jupyter yields the following information. As you can see,
SPARK_HOME is set to the value of '/usr/lib/spark-current'
'ZOO_LOG_DIR': '/mnt/disk1/log/zookeeper',
'CONDA_SHLVL': '0',
'JAVA_LIBRARY_PATH': '/usr/lib/hadoop-current/lib/native:/usr/lib/bigboot-current/lib:/usr/lib/hadoop-current/lib/native:/usr/lib/bigboot-current/lib:',
'HIVE_CONF_DIR': '/etc/ecm/hive-conf',
'SHELL': '/bin/bash',
'ZEPPELIN_NOTEBOOK_DIR': '/mnt/disk1/zeppelin/notebooks',
'LOGNAME': 'root',
'_': '/opt/anaconda3/bin/jupyter',
'HADOOP_HOME': '/usr/lib/hadoop-current',
'YARN_PID_DIR': '/usr/lib/hadoop-current/pids',
'USER': 'root',
'SUDO_COMMAND': '/bin/su',
'HADOOP_CONF_DIR': '/etc/ecm/hadoop-conf',
'ZEPPELIN_CONF_DIR': '/etc/ecm/zeppelin-conf',
'ZEPPELIN_HOME': '/usr/lib/zeppelin-current',
'YARN_LOG_DIR': '/var/log/hadoop-yarn',
'SPARK_CONF_DIR': '/etc/ecm/spark-conf',
'PIG_HOME': '/usr/lib/pig-current',
'OOZIE_URL': 'http://emr-header-1.cluster-41646:11000/oozie/',
'SHLVL': '1',
'XDG_SESSION_ID': '209',
'OOZIE_CONFIG': '/etc/ecm/oozie-conf',
'HISTTIMEFORMAT': '%d/%m/%y %T ',
'HIVE_HOME': '/usr/lib/hive-current',
'ZOOCFGDIR': '/etc/ecm/zookeeper-conf',
'HADOOP_MAPRED_PID_DIR': '/usr/lib/hadoop-current/pids',
'JAVA_HOME': '/usr/lib/jvm/java-1.8.0',
'HISTSIZE': '1000',
'BIGBOOT_HOME': '/usr/lib/bigboot-current',
'HADOOP_CLASSPATH': '/usr/lib/hadoop-current/lib/*:/usr/lib/tez-current/*:/usr/lib/tez-current/lib/*:/etc/ecm/tez-conf:/usr/lib/hadoop-current/lib/*:/usr/lib/tez-current/*:/usr/lib/tez-current/lib/*:/etc/ecm/tez-conf:/opt/apps/extra-jars/*:/usr/lib/spark-current/yarn/spark-2.4.3-yarn-shuffle.jar:/opt/apps/extra-jars/*:/usr/lib/spark-current/yarn/spark-2.4.3-yarn-shuffle.jar',
'SPARK_HOME': '/usr/lib/spark-current',
'CONDA_ROOT': '/opt/anaconda3',
'HUE_HOME': '/usr/lib/hue-current',
'SQOOP_HOME': '/usr/lib/sqoop-current',
'HADOOP_LOG_DIR': '/var/log/hadoop-hdfs',
'ZOO_LOG4J_PROP': 'INFO,ROLLINGFILE',
'HISTCONTROL': 'ignoredups',
'SUDO_USER': 'user',
'SUDO_GID': '1012',
'PIG_CONF_DIR': '/etc/ecm/pig-conf',
'HOSTNAME': 'emr-header-1.cluster-41646',
'SUDO_UID': '1012',
'SQOOP_CONF_DIR': '/etc/ecm/sqoop-conf',
'LESSOPEN': '||/usr/bin/lesspipe.sh %s',
'SPARK_LOG_DIR': '/mnt/disk1/log/spark',
'FLOW_AGENT_HOME': '/usr/lib/flow-agent-current',
'HADOOP_MAPRED_LOG_DIR': '/var/log/hadoop-mapred',
'FLOW_AGENT_CONF_DIR': '/etc/ecm/flow-agent-conf',
'LIVY_CONF_DIR': '/etc/ecm/livy-conf',
'TEZ_HOME': '/usr/lib/tez-current',
'SPARK_PID_DIR': '/usr/lib/spark-current/pids',
'MAIL': '/var/spool/mail/root',
'JVMFLAGS': ' -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=128M -Xloggc:/mnt/disk1/log/zookeeper/zookeeper-gc.log -javaagent:/var/lib/ecm-agent/data/jmxetric-1.0.8.jar=host=localhost,port=8649,mode=unicast,wireformat31x=true,process=ZOOKEEPER_ZOOKEEPER,config=/var/lib/ecm-agent/data/jmxetric.xml -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=128M -Xloggc:/mnt/disk1/log/zookeeper/zookeeper-gc.log -javaagent:/var/lib/ecm-agent/data/jmxetric-1.0.8.jar=host=localhost,port=8649,mode=unicast,wireformat31x=true,process=ZOOKEEPER_ZOOKEEPER,config=/var/lib/ecm-agent/data/jmxetric.xml',
'USERNAME': 'root',
'OLDPWD': '/root/airflow',
'PWD': '/root',
'LANG': 'en_US.UTF-8',
'TERM': 'xterm-color',
'PATH': '/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/opt/anaconda3/bin:/usr/lib/sqoop-current/bin:/usr/lib/spark-current/bin:/usr/lib/pig-current/bin:/usr/lib/hive-current/hcatalog/bin:/usr/lib/hive-current/bin:/usr/local/sbin:/usr/lib/sqoop-current/bin:/usr/lib/spark-current/bin:/usr/lib/pig-current/bin:/usr/lib/hive-current/hcatalog/bin:/usr/lib/hive-current/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/bigboot-current/bin:/usr/lib/flow-agent-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/usr/lib/oozie-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/usr/local/bin/:/usr/lib/bigboot-current/bin:/usr/lib/flow-agent-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/usr/lib/oozie-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin',
'LIVY_HOME': '/usr/lib/livy-current',
'OOZIE_HOME': '/usr/lib/oozie-current',
'HCAT_HOME': '/usr/lib/hive-current/hcatalog',
'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:',
'HADOOP_PID_DIR': '/usr/lib/hadoop-current/pids',
'TEZ_CONF_DIR': '/etc/ecm/tez-conf',
'HOME': '/root',
'ZOOKEEPER_HOME': '/usr/lib/zookeeper-current',
'HUE_CONF_DIR': '/etc/ecm/hue-conf',
'JPY_PARENT_PID': '12794',
'CLICOLOR': '1',
'PAGER': 'cat',
'GIT_PAGER': 'cat',
'MPLBACKEND': 'module://ipykernel.pylab.backend_inline',
'PYSPARK_DRIVER_PYTHON': '/home/user/.conda/envs/test_env/bin/python',
'PYSPARK_PYTHON': '/mnt/disk1/user/test_env/bin/python'
I think something else is causing the issue but the error does not seem to be very descriptive beyond the fact that the java process exited before sending the driver its port number.
EDIT: I would also like to note that starting a normal Pyspark session without adding the repository and mmlspark package as a config does not yield any issues.
In your config there are some livy parameters, are you running this using Apache Livy?
If so, to get that working I had to exclude some packages during the configuration (which was done within the notebook using configure magic)
%%configure -f
{
"name": "mmlspark",
"conf": {
"spark.jars.packages": "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11,org.scalactic:scalactic_2.11,org.scalatest:scalatest_2.11"
}
}
hello, I meet the same trouble. I have solved the problem ! you can try the follows:
pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 --repositories=https://mmlspark.azureedge.net/maven
Hope my way works for you!!! hhhhhhhhh
@chongjeeseng spirit from this URL: https://www.diycode.cc/projects/Azure/mmlspark
I resolved this problem to specify microsoft repository. Thanks.
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master yarn-client --num-executors 1 --executor-memory 1g --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3 --repositories 'https://mmlspark.azureedge.net/maven' pyspark-shell"
When trying to install mmlspark, I encountered an error and googling it does not seem to yield many results.
Some Info about my setup
The stack trace:
Would gladly appreciate any pointers in tracing the issue.