hadoop@ecs-c04d:~$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.2
/_/
Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_275
Branch
Compiled by user jshao on 2018-09-16T12:15:32Z
Revision
Url
Type --help for more information.
hadoop@ecs-c04d:~$ hdfs version
Hadoop 3.1.1
Source code repository https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c
Compiled by leftnoteasy on 2018-08-02T04:26Z
Compiled with protoc 2.5.0
From source with checksum f76ac55e5b5ff0382a9f7df36a3ca5a0
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.1.1.jar
hadoop@ecs-c04d:~$ spark-submit --jars hadoop-huaweicloud-3.1.1-hw-40.jar,esdk-obs-java-3.20.6.1.jar obs_test.py
21/01/03 09:04:55 WARN Utils: Your hostname, ecs-c04d resolves to a loopback address: 127.0.1.1; using 192.168.0.230 instead (on interface eth0)
21/01/03 09:04:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/01/03 09:04:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/01/03 09:04:56 INFO SparkContext: Running Spark version 2.3.2
21/01/03 09:04:56 INFO SparkContext: Submitted application: obs test
21/01/03 09:04:56 INFO SecurityManager: Changing view acls to: hadoop
21/01/03 09:04:56 INFO SecurityManager: Changing modify acls to: hadoop
21/01/03 09:04:56 INFO SecurityManager: Changing view acls groups to:
21/01/03 09:04:56 INFO SecurityManager: Changing modify acls groups to:
21/01/03 09:04:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/01/03 09:04:56 INFO Utils: Successfully started service 'sparkDriver' on port 36681.
21/01/03 09:04:56 INFO SparkEnv: Registering MapOutputTracker
21/01/03 09:04:56 INFO SparkEnv: Registering BlockManagerMaster
21/01/03 09:04:56 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/01/03 09:04:56 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/01/03 09:04:56 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e294dec0-2f9d-4f3f-9e7e-3875c2b20d58
21/01/03 09:04:56 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/01/03 09:04:56 INFO SparkEnv: Registering OutputCommitCoordinator
21/01/03 09:04:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/01/03 09:04:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.230:4040
21/01/03 09:04:57 INFO SparkContext: Added JAR file:///home/hadoop/hadoop-huaweicloud-3.1.1-hw-40.jar at spark://192.168.0.230:36681/jars/hadoop-huaweicloud-3.1.1-hw-40.jar with timestamp 1609635897154
21/01/03 09:04:57 INFO SparkContext: Added JAR file:///home/hadoop/esdk-obs-java-3.20.6.1.jar at spark://192.168.0.230:36681/jars/esdk-obs-java-3.20.6.1.jar with timestamp 1609635897155
21/01/03 09:04:57 INFO SparkContext: Added file file:/home/hadoop/obs_test.py at file:/home/hadoop/obs_test.py with timestamp 1609635897166
21/01/03 09:04:57 INFO Utils: Copying /home/hadoop/obs_test.py to /tmp/spark-a51e2865-0465-4a1b-a6c5-1da954078da6/userFiles-6c5a6a09-bdbe-45bb-8629-a2041d237232/obs_test.py
21/01/03 09:04:57 INFO Executor: Starting executor ID driver on host localhost
21/01/03 09:04:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39711.
21/01/03 09:04:57 INFO NettyBlockTransferService: Server created on 192.168.0.230:39711
21/01/03 09:04:57 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/01/03 09:04:57 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.230, 39711, None)
21/01/03 09:04:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.230:39711 with 366.3 MB RAM, BlockManagerId(driver, 192.168.0.230, 39711, None)
21/01/03 09:04:57 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.230, 39711, None)
21/01/03 09:04:57 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.230, 39711, None)
21/01/03 09:04:57 INFO EventLoggingListener: Logging events to hdfs://localhost:9000/spark-logs/local-1609635897202
21/01/03 09:04:57 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/hadoop/spark-warehouse').
21/01/03 09:04:57 INFO SharedState: Warehouse path is 'file:/home/hadoop/spark-warehouse'.
21/01/03 09:04:58 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
21/01/03 09:04:58 WARN FileStreamSink: Error while looking for metadata directory.
Traceback (most recent call last):
File "/home/hadoop/obs_test.py", line 11, in <module>
df = spark.read.csv("obs://dev-modelarts/kaggle-CTR/data/data/train.csv", header=True, inferSchema=True)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv
File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o55.csv.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.obs.OBSFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2596)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:709)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:390)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:390)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:389)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.obs.OBSFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2500)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2594)
... 30 more
21/01/03 09:04:58 INFO SparkContext: Invoking stop() from shutdown hook
21/01/03 09:04:58 INFO SparkUI: Stopped Spark web UI at http://192.168.0.230:4040
21/01/03 09:04:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/01/03 09:04:58 INFO MemoryStore: MemoryStore cleared
21/01/03 09:04:58 INFO BlockManager: BlockManager stopped
21/01/03 09:04:58 INFO BlockManagerMaster: BlockManagerMaster stopped
21/01/03 09:04:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/01/03 09:04:58 INFO SparkContext: Successfully stopped SparkContext
21/01/03 09:04:58 INFO ShutdownHookManager: Shutdown hook called
21/01/03 09:04:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-a51e2865-0465-4a1b-a6c5-1da954078da6/pyspark-c89583cc-2dd6-419c-ae4a-c7119739455c
21/01/03 09:04:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-c0e74089-56ef-4ee7-8944-446c3bd77482
21/01/03 09:04:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-a51e2865-0465-4a1b-a6c5-1da954078da6
问题描述: 按照该仓库的手册配置插件后,在代码中访问obs路径仍然失败。报Class org.apache.hadoop.fs.obs.OBSFileSystem not found错误。
环境信息如下: spark 2.3.2 无部署,直接通过spark-submit调用; hadoop 3.1.1 单机伪分布式部署; spark可以正常访问hadoop,详见最后的执行日志。
下载了hadoop-huaweicloud-3.1.1-hw-40.jar和esdk-obs-java-3.20.6.1.jar并放置到了spark和hadoop的依赖目录: /usr/local/spark/jars/ /usr/local/hadoop/share/hadoop/common/lib/ /usr/local/hadoop/share/hadoop/tools/lib/ /usr/local/hadoop/share/hadoop/hdfs/lib
core-site.xml内容(obs桶在北京四region)
运行代码:
运行命令:
spark-submit --jars hadoop-huaweicloud-3.1.1-hw-40.jar,esdk-obs-java-3.20.6.1.jar obs_test.py
报错日志: