sustcoder / blog

code life and hope
0 stars 0 forks source link

spark-Environment #1

Open sustcoder opened 6 years ago

sustcoder commented 6 years ago

the issues in set up spark evn

sustcoder commented 6 years ago

sbt导入项目时会出现sbt:dump project structure from sbt shell的过程会非常的慢,sbt shell窗口会显示busy,下载依赖的过程比较慢,等待即可,优化方法待续...

sustcoder commented 6 years ago
方案一

直接修改sbt的jar里面的配置文件。windows下可通过360压缩替换掉jar包里面的文件。

  1. 找到sbt安装目录D:\ProgramFile\sbt\bin
  2. 备份sbt-launch.jarsbt-launch.jar.bak
  3. 解压sbt-launch.jar.bak,打开个sbt.boot.properties文件
  4. \[repositories\]里面的local下面添加以下数据源
    alirepo1:https://maven.aliyun.com/repository/central
    alirepo2:https://maven.aliyun.com/repository/jcenter
    alirepo3:https://maven.aliyun.com/repository/public
  5. 使用360压缩打开sbt-launch.jar,找到sbt.boot.properties文件并替换
方案二

配置sbt的数据源,让其优先加载我们配置的数据源

  1. D:\ProgramFile\sbt\conf目录下,新建文件repository.properties
  2. repository.properties中添加以下内容
    [repositories]
    local
    alirepo1:https://maven.aliyun.com/repository/central
    alirepo2:https://maven.aliyun.com/repository/jcenter
    alirepo3:https://maven.aliyun.com/repository/public
  3. conf/sbtconfig.txt中添加repository.properties文件路径
    -Dsbt.repository.config=D:/ProgramFile/sbt/conf/repository.properties
sustcoder commented 6 years ago

执行脚本

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client /home/hadoop/app/spark2.2.0/examples/jars/spark-examples_2.11-2.2.0.jar

报错信息

Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
    at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 17:01:43 INFO util.ShutdownHookManager: Shutdown hook called

问题原因 todo 解决方法 todo

sustcoder commented 6 years ago

windows下运行spark-shell报错

xception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:124)
...

解决方法/bin/spark-class2.cmd文件中添加以下内容

set SPARK_DIST_CLASSPATH=%HADOOP_HOME%\etc\hadoop\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\tools\lib\*
sustcoder commented 6 years ago

idea中运行报错

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

解决办法

  1. 通过以下两种方式配置环境变量:
    • 在path中配置hadoop和spark的环境变量
    • 在代码中添加 System.setProperty("hadoop.home.dir", "E:\\data\\gitee\\hadoop-2.6.0");
  2. 安装winutils.exe 下载地址:https://github.com/srccodes/hadoop-common-2.2.0-bin,下载后解压,将winutils.exe拷贝到hadoop的bin目录下。
sustcoder commented 6 years ago

通过spark-submit或者idea中提交jar包报错

18/09/28 09:41:52 ERROR TaskSchedulerImpl:Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.

问题原因

服务器上版本号和本地版本号不对应,导致进行序列化的UID不一致:

class incompatible: stream classdesc serialVersionUID 8789839749593513237, local class serialVersionUID = -4145741279224749316

解决方法

  1. 如果是通过submit提交的程序报错,则是本地编译spark程序的scala版本和spark版本和服务器上的版本是否一致
  2. 如果检查完版本对应,则需要再次确认编译spark的scala版本号是否对应Spark通过maven进行build时,默认scala版本为2.10。若要为Scala 2.11进行编译,如果不一致也需要调整本地编译scala的版本号 总结来说都是版本不兼容在作怪

参考链接

解决在编程方式下无法访问 Spark Master 问题

sustcoder commented 6 years ago

format格式化导致问题

现象

java.io.IOException: Incompatible clusterIDs in E:\data\hdfsData\data\datanode: 
namenode clusterID = CID-f6ae052a-9745-4710-a3c1-cfab6ef8fc91; datanode clusterID = CID-7bbfeab7-ec7b-4d82-a3f4-e56d53ff7668

原因

执行hdfs namenode -format后,current目录会删除并重新生成,其中VERSION文件中的clusterID也会随之变化,而datanode的VERSION文件中的clusterID保持不变,造成两个clusterID不一致

解决方法

为了避免这种情况,可以再执行的namenode格式化之后,删除datanode的current文件夹,或者修改datanode的VERSION文件中出clusterID与namenode的VERSION文件中的clusterID一样,然后重新启动dfs