oracle / oci-hdfs-connector

HDFS Connector for Oracle Cloud Infrastructure
https://cloud.oracle.com/cloud-infrastructure
Other
27 stars 26 forks source link

hive external table of OCI object storage , run select count(*) return error : No FileSystem for scheme "oci" #50

Closed zhengwanbo closed 3 years ago

zhengwanbo commented 3 years ago

Hadoop version: 1、HDP 3.1.4.0-315 2、Hive 3.1.0 3、hdfs-connector: oci-hdfs-full-3.3.0.7.0.1

logs: 0: jdbc:hive2://bigdata-hadoop-2.sub070606371> show create table ssb_customer_txt_obj; DEBUG : Acquired the compile lock. INFO : Compiling command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d): show create table ssb_customer_txt_obj DEBUG : Encoding valid txns info 2040:9223372036854775807:: txnid:2040 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d); Time taken: 3.461 seconds INFO : Executing command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d): show create table ssb_customer_txt_obj INFO : Starting task [Stage-0:DDL] in serial mode DEBUG : Task getting executed using mapred tag : hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d,userid=root INFO : Completed executing command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d); Time taken: 1.54 seconds INFO : OK DEBUG : Shutting down query show create table ssb_customer_txt_obj +----------------------------------------------------+ | createtab_stmt | +----------------------------------------------------+ | CREATE EXTERNAL TABLE ssb_customer_txt_obj( | | c_custkey int, | | c_name string, | | c_address string, | | c_city string, | | c_nation string, | | c_region string, | | c_phone string, | | c_mktsegment string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'='|', | | 'serialization.format'='|') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'oci://bigdata@ocichina001/ssb100_data/customer' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'discover.partitions'='true', | | 'transient_lastDdlTime'='1625018554') | +----------------------------------------------------+ 24 rows selected (5.365 seconds)

0: jdbc:hive2://bigdata-hadoop-2.sub070606371> 0: jdbc:hive2://bigdata-hadoop-2.sub070606371> select from ssb_customer_txt_obj limit 2; DEBUG : Acquired the compile lock. INFO : Compiling command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f): select from ssb_customer_txt_obj limit 2 DEBUG : Encoding valid txns info 2042:9223372036854775807:: txnid:2042 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:ssb_customer_txt_obj.c_custkey, type:int, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_name, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_address, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_city, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_nation, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_region, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_phone, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_mktsegment, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f); Time taken: 6.69 seconds INFO : Executing command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f): select from ssb_customer_txt_obj limit 2 INFO : Completed executing command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f); Time taken: 0.342 seconds INFO : OK DEBUG : Shutting down query select from ssb_customer_txt_obj limit 2 +---------------------------------+------------------------------+---------------------------------+------------------------------+--------------------------------+--------------------------------+-------------------------------+------------------------------------+ | ssb_customer_txt_obj.c_custkey | ssb_customer_txt_obj.c_name | ssb_customer_txt_obj.c_address | ssb_customer_txt_obj.c_city | ssb_customer_txt_obj.c_nation | ssb_customer_txt_obj.c_region | ssb_customer_txt_obj.c_phone | ssb_customer_txt_obj.c_mktsegment | +---------------------------------+------------------------------+---------------------------------+------------------------------+--------------------------------+--------------------------------+-------------------------------+------------------------------------+ | 1 | Customer#000000001 | j5JsirBM9P | MOROCCO 0 | MOROCCO | AFRICA | 25-989-741-2988 | BUILDING | | 2 | Customer#000000002 | 487LW1dovn6Q4dMVym | JORDAN 1 | JORDAN | MIDDLE EAST | 23-768-687-3665 | AUTOMOBILE | +---------------------------------+------------------------------+---------------------------------+------------------------------+--------------------------------+--------------------------------+-------------------------------+------------------------------------+ 2 rows selected (14.117 seconds)

0: jdbc:hive2://bigdata-hadoop-2.sub070606371> select count() from ssb_customer_txt_obj; DEBUG : Acquired the compile lock. INFO : Compiling command(queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7): select count() from ssb_customer_txt_obj DEBUG : Encoding valid txns info 2041:9223372036854775807:: txnid:2041 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7); Time taken: 16.057 seconds INFO : Executing command(queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7): select count() from ssb_customer_txt_obj INFO : Query ID = hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode DEBUG : Task getting executed using mapred tag : hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7,userid=root INFO : Subscribed to counters: [] for queryId: hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7 INFO : Tez session hasn't been created yet. Opening session DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/orai18n.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osdt_cert.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/oraclepki.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/xdb.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/hive-hcatalog-core.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osdt_jce.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/ucp.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/ojdbc8.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osdt_core.jar" DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osh.jar" INFO : Dag name: select count() from ssb_customer_txt_obj (Stage-1) DEBUG : DagInfo: {"context":"Hive","description":"select count() from ssb_customer_txt_obj"} DEBUG : Setting Tez DAG access for queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7 with viewAclString=, modifyStr=root,hive ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1625031757474_0001_1_00, diagnostics=[Vertex vertex_1625031757474_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ssb_customer_txt_obj initializer failed, vertex=vertex_1625031757474_0001_1_00 [Map 1], org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci" at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:268) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ] ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1625031757474_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1625031757474_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 INFO : org.apache.tez.common.counters.DAGCounter: INFO : AM_CPU_MILLISECONDS: 2640 INFO : AM_GC_TIME_MILLIS: 124 ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1625031757474_0001_1_00, diagnostics=[Vertex vertex_1625031757474_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ssb_customer_txt_obj initializer failed, vertex=vertex_1625031757474_0001_1_00 [Map 1], org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci" at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:268)

    VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1625031757474_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1625031757474_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 DEBUG : Shutting down query select count(*) from ssb_customer_txt_obj

    VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 container INITIALIZING -1 0 0 -1 0 0 Reducer 2 container INITED 1 0 0 1 0 0

VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 6.86 s

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1625031757474_0001_1_00, diagnostics=[Vertex vertex_1625031757474_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ssb_customer_txt_obj initializer failed, vertex=vertex_1625031757474_0001_1_00 [Map 1], org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci" at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:268) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1625031757474_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1625031757474_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)

jodoglevy commented 3 years ago

@zhengwanbo thanks for filing this issue - we'll take a look and get back to you

omkar07 commented 3 years ago

hi @zhengwanbo, It seems you are not able to run count query but it has nothing to do with hdfs connector. You will need to do the following:

  1. Reference the JAR file before starting the Spark shell i.e by placing hdfs connector lib and third-party jars in spark-3.1.2-bin-hadoop3.2/jars
  2. Create .oci folder and copy your api keys. Also, create core-site.xml and place it spark-3.1.2-bin-hadoop3.2/conf folder along with spark-defaults.conf file. follow: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/hdfsconnectorspark.htm
  3. Run command set hive.compute.query.using.stats=false; on Beeline-Hive. This way, Hive would perform count on the basis of data present in HDFS through a MapReduce job.

I was able to setup and run the count query by following above steps. Below is screenshot. Please let me know if you any questions.

Screen Shot 2021-07-16 at 4 05 59 PM
jodoglevy commented 3 years ago

@zhengwanbo - checking in here since we haven't heard back from you in a while. Did @omkar07's response resolve your issue?

omkar07 commented 3 years ago

Hi @zhengwanbo - since we haven't heard back from you, we're resolving this issue. But feel free to reopen this if you are still experiencing problems.