Closed chen-rj88 closed 3 months ago
感谢反馈,请给出完整的运行日志以便于复现并定位问题
您好,详细执行日志请见附件,另外截图是执行成功后,通过sql进行查询的结果。期待您的回复~ addax_oracle-hive-orc_20240711_150451_186301.log
请给出addax 执行的完整日志,尽可能不要截图
以下为日志信息:
2024-07-11 15:04:51.673 [ main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-07-11 15:04:51.689 [ main] INFO Engine -
{
"setting":{
"speed":{
"channel":1
},
"errorLimit":{
"record":0,
"percentage":0.02
}
},
"content":{
"reader":{
"name":"sqlserverreader",
"parameter":{
"username":"YjsySjdj",
"password":"*****",
"column":[
"[termcode]",
"[termname]",
"[year]",
"[term]",
"[weeks]",
"[begindate]",
"[enddate]",
"[currentxq]"
],
"where":"termcode != '3'",
"connection":[
{
"table":[
"dbo.py_tabterm"
],
"jdbcUrl":[
"jdbc:sqlserver://10.100.98.16:1433;DatabaseName=gmis5;CurrentSchema=dbo;encrypt=false"
]
}
]
}
},
"writer":{
"name":"hdfswriter",
"parameter":{
"defaultFS":"hdfs://10.100.98.94:8020",
"fileType":"orc",
"fieldDelimiter":"\u0001",
"path":"/data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709",
"fileName":"ods_py_tabterm2",
"writeMode":"truncate",
"column":[
{
"name":"termcode",
"type":"string"
},
{
"name":"termname",
"type":"string"
},
{
"name":"year",
"type":"int"
},
{
"name":"term",
"type":"string"
},
{
"name":"weeks",
"type":"int"
},
{
"name":"begindate",
"type":"string"
},
{
"name":"enddate",
"type":"string"
},
{
"name":"currentxq",
"type":"string"
}
]
}
}
}
}
2024-07-11 15:04:51.717 [ main] INFO JobContainer - The jobContainer begins to process the job.
2024-07-11 15:04:52.148 [ job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl [jdbc:sqlserver://10.100.98.16:1433;DatabaseName=gmis5;CurrentSchema=dbo;encrypt=false].
2024-07-11 15:04:52.188 [ job-0] INFO OriginalConfPretreatmentUtil - The table [dbo.py_tabterm] has columns [termcode,termname,year,term,weeks,begindate,enddate,currentxq].
2024-07-11 15:04:52.427 [ job-0] WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-07-11 15:04:52.905 [ job-0] INFO JobContainer - The Reader.Job [sqlserverreader] perform prepare work .
2024-07-11 15:04:52.905 [ job-0] INFO JobContainer - The Writer.Job [hdfswriter] perform prepare work .
2024-07-11 15:04:53.053 [ job-0] INFO JobContainer - Job set Channel-Number to 1 channel(s).
2024-07-11 15:04:53.055 [ job-0] INFO JobContainer - The Reader.Job [sqlserverreader] is divided into [1] task(s).
2024-07-11 15:04:53.056 [ job-0] INFO HdfsWriter$Job - Begin splitting ...
2024-07-11 15:04:53.066 [ job-0] INFO HdfsWriter$Job - The split wrote files :[/data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa/ods_py_tabterm2_20240711_150453_065_8tw5v1gu.orc]
2024-07-11 15:04:53.067 [ job-0] INFO HdfsWriter$Job - Finish splitting.
2024-07-11 15:04:53.067 [ job-0] INFO JobContainer - The Writer.Job [hdfswriter] is divided into [1] task(s).
2024-07-11 15:04:53.088 [ job-0] INFO JobContainer - The Scheduler launches [1] taskGroup(s).
2024-07-11 15:04:53.095 [ taskGroup-0] INFO TaskGroupContainer - The taskGroupId=[0] started [1] channels for [1] tasks.
2024-07-11 15:04:53.097 [ taskGroup-0] INFO Channel - The Channel set byte_speed_limit to -1, No bps activated.
2024-07-11 15:04:53.097 [ taskGroup-0] INFO Channel - The Channel set record_speed_limit to -1, No tps activated.
2024-07-11 15:04:53.106 [ reader-0-0] INFO CommonRdbmsReader$Task - Begin reading records by executing SQL query: [select [termcode],[termname],[year],[term],[weeks],[begindate],[enddate],[currentxq] from dbo.py_tabterm where (termcode != '3')].
2024-07-11 15:04:53.112 [ writer-0-0] INFO HdfsWriter$Task - Begin to write file : [/data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa/ods_py_tabterm2_20240711_150453_065_8tw5v1gu.orc]
2024-07-11 15:04:53.141 [ writer-0-0] INFO HadoopShimsCurrent - Can't get KeyProvider for ORC encryption from hadoop.security.key.provider.path.
2024-07-11 15:04:53.151 [ reader-0-0] INFO CommonRdbmsReader$Task - Finished reading records by executing SQL query: [select [termcode],[termname],[year],[term],[weeks],[begindate],[enddate],[currentxq] from dbo.py_tabterm where (termcode != '3')].
2024-07-11 15:04:53.189 [ writer-0-0] INFO PhysicalFsWriter - ORC writer created for path: /data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa/ods_py_tabterm2_20240711_150453_065_8tw5v1gu.orc with stripeSize: 67108864 blockSize: 268435456 compression: Compress: none buffer: 262144
2024-07-11 15:04:53.218 [ writer-0-0] INFO WriterImpl - ORC writer created for path: /data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa/ods_py_tabterm2_20240711_150453_065_8tw5v1gu.orc with stripeSize: 67108864 options: Compress: none buffer: 262144
2024-07-11 15:04:53.405 [ writer-0-0] INFO HdfsWriter$Task - Finish write
2024-07-11 15:04:56.104 [ job-0] INFO AbstractScheduler - The scheduler has completed all tasks.
2024-07-11 15:04:56.104 [ job-0] INFO JobContainer - The Writer.Job [hdfswriter] perform post work.
2024-07-11 15:04:56.106 [ job-0] INFO HdfsHelper - Begin to move file from [hdfs://10.100.98.94:8020/data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa/ods_py_tabterm2_20240711_150453_065_8tw5v1gu.orc] to [ds=20240709].
2024-07-11 15:04:56.114 [ job-0] INFO HdfsHelper - Finish move file(s).
2024-07-11 15:04:56.114 [ job-0] INFO HdfsHelper - Begin to delete temporary dir [/data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa] .
2024-07-11 15:04:56.119 [ job-0] INFO HdfsHelper - Finish deleting temporary dir [/data/hive/warehouse/gmc_ods.db/ods_py_tabterm2/ds=20240709/.94a5ee51_7494_4de4_9093_5ceca6589faa] .
2024-07-11 15:04:56.119 [ job-0] INFO JobContainer - The Reader.Job [sqlserverreader] perform post work.
2024-07-11 15:04:56.121 [ job-0] INFO StandAloneJobContainerCommunicator - Total 43 records, 1979 bytes | Speed 659B/s, 14 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2024-07-11 15:04:56.122 [ job-0] INFO JobContainer -
Job start at : 2024-07-11 15:04:51
Job end at : 2024-07-11 15:04:56
Job took secs : 4s
Average bps : 659B/s
Average rps : 14rec/s
Number of rec : 43
Failed record : 0
从日志来看,似乎没有问题 麻烦再提供以下信息:
收到,具体信息如下:
使用了你提供的数据,在我的 Hive 环境下读取没有问题,过程如下:
创建外部表,增加分区
CREATE external table ods_py_tabterm2 (
termcode STRING,
termname STRING,
year INT,
term STRING,
weeks INT,
begindate STRING,
enddate STRING,
currentxq STRING
) PARTITIONED BY (ds STRING) STORED AS ORC
location "/tmp/ods_py_tabterm2";
alter table ods_py_tabterm2 add if not exists partition (ds='20240711');
将orc文件上传到 对应分区目录
进行查询
select * from ods_py_tabterm2 where ds = '20240711' limit 10;
INFO : Compiling command(queryId=hive_20240711203322_704e35c0-ab5c-440b-b18d-8946b167ba52): select * from ods_py_tabterm2 where ds = '20240711' limit 10
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:ods_py_tabterm2.termcode, type:string, comment:null), FieldSchema(name:ods_py_tabterm2.termname, type:string, comment:null), FieldSchema(name:ods_py_tabterm2.year, type:int, comment:null), FieldSchema(name:ods_py_tabterm2.term, type:string, comment:null), FieldSchema(name:ods_py_tabterm2.weeks, type:int, comment:null), FieldSchema(name:ods_py_tabterm2.begindate, type:string, comment:null), FieldSchema(name:ods_py_tabterm2.enddate, type:string, comment:null), FieldSchema(name:ods_py_tabterm2.currentxq, type:string, comment:null), FieldSchema(name:ods_py_tabterm2.ds, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20240711203322_704e35c0-ab5c-440b-b18d-8946b167ba52); Time taken: 0.285 seconds
INFO : Executing command(queryId=hive_20240711203322_704e35c0-ab5c-440b-b18d-8946b167ba52): select * from ods_py_tabterm2 where ds = '20240711' limit 10
INFO : Completed executing command(queryId=hive_20240711203322_704e35c0-ab5c-440b-b18d-8946b167ba52); Time taken: 0.015 seconds
INFO : OK
+---------------------------+---------------------------+-----------------------+-----------------------+------------------------+----------------------------+--------------------------+----------------------------+---------------------+
| ods_py_tabterm2.termcode | ods_py_tabterm2.termname | ods_py_tabterm2.year | ods_py_tabterm2.term | ods_py_tabterm2.weeks | ods_py_tabterm2.begindate | ods_py_tabterm2.enddate | ods_py_tabterm2.currentxq | ods_py_tabterm2.ds |
+---------------------------+---------------------------+-----------------------+-----------------------+------------------------+----------------------------+--------------------------+----------------------------+---------------------+
| 10 | 2006-2007春学期 | 2007 | 2 | 20 | 2007-03-05 00:00:00.0 | 2007-07-14 00:00:00.0 | 0 | 20240711 |
| 11 | 2007-2008秋学期 | 2007 | 1 | 20 | 2007-08-26 00:00:00.0 | 2008-01-20 00:00:00.0 | 0 | 20240711 |
| 12 | 2007-2008春学期 | 2008 | 2 | 20 | 2008-02-25 00:00:00.0 | 2008-07-12 00:00:00.0 | 0 | 20240711 |
| 13 | 2008-2009秋学期 | 2008 | 1 | 20 | 2008-08-25 00:00:00.0 | 2009-01-09 00:00:00.0 | 0 | 20240711 |
| 14 | 2008-2009春学期 | 2009 | 2 | 20 | 2009-02-16 00:00:00.0 | 2009-07-03 00:00:00.0 | 0 | 20240711 |
| 15 | 2009-2010秋学期 | 2009 | 1 | 21 | 2009-08-23 00:00:00.0 | 2010-01-16 00:00:00.0 | 0 | 20240711 |
| 16 | 2009-2010春学期 | 2010 | 2 | 20 | 2010-02-27 00:00:00.0 | 2010-07-11 00:00:00.0 | 0 | 20240711 |
| 17 | 2010-2011秋学期 | 2010 | 1 | 20 | 2010-08-28 00:00:00.0 | 2011-01-16 00:00:00.0 | 0 | 20240711 |
| 18 | 2010-2011春学期 | 2011 | 2 | 20 | 2011-02-19 00:00:00.0 | 2011-07-08 00:00:00.0 | 0 | 20240711 |
| 19 | 2011-2012秋学期 | 2011 | 1 | 20 | 2011-08-29 00:00:00.0 | 2012-01-15 00:00:00.0 | 0 | 20240711 |
+---------------------------+---------------------------+-----------------------+-----------------------+------------------------+----------------------------+--------------------------+----------------------------+---------------------+
10 rows selected (0.357 seconds)
我的 Hive 版本是 3.1.0
,目前 hdfswriter
依赖的是 hadoop 3.0
版本,可能存在版本差异问题。
我看能不能找一个 2.1.1
环境下,然后进行测试
您好,我刚才把hdfswriter
依赖的hadoop版本改为2.7.1,试了一下还是无法正常读取数据。
另外,我想问一下咱们这个平台有没有交流沟通群,有的话我希望能加入一下,谢谢~
提供的orc文件在 hive 2.3.2
版本下测试通过
提供的 orc 文件在 2.1.0
版本下也测试通过
0: jdbc:hive2://localhost:10000> select version();
+--------------------------------------------------+--+
| _c0 |
+--------------------------------------------------+--+
| 2.1.0 r9265bc24d75ac945bde9ce1a0999fddd8f2aae29 |
+--------------------------------------------------+--+
1 row selected (0.099 seconds)
0: jdbc:hive2://localhost:10000> select * from ods_py_tabterm2 limit 3;
+---------------------------+---------------------------+-----------------------+-----------------------+------------------------+----------------------------+--------------------------+----------------------------+---------------------+--+
| ods_py_tabterm2.termcode | ods_py_tabterm2.termname | ods_py_tabterm2.year | ods_py_tabterm2.term | ods_py_tabterm2.weeks | ods_py_tabterm2.begindate | ods_py_tabterm2.enddate | ods_py_tabterm2.currentxq | ods_py_tabterm2.ds |
+---------------------------+---------------------------+-----------------------+-----------------------+------------------------+----------------------------+--------------------------+----------------------------+---------------------+--+
| 10 | 2006-2007??? | 2007 | 2 | 20 | 2007-03-05 00:00:00.0 | 2007-07-14 00:00:00.0 | 0 | 20240711 |
| 11 | 2007-2008??? | 2007 | 1 | 20 | 2007-08-26 00:00:00.0 | 2008-01-20 00:00:00.0 | 0 | 20240711 |
| 12 | 2007-2008??? | 2008 | 2 | 20 | 2008-02-25 00:00:00.0 | 2008-07-12 00:00:00.0 | 0 | 20240711 |
+---------------------------+---------------------------+-----------------------+-----------------------+------------------------+----------------------------+--------------------------+----------------------------+---------------------+--+
3 rows selected (0.098 seconds)
中文不显示的问题可以忽略,因为我没有安装中文字体。
我测试环境用 Docker 部署的,部署信息参见 https://github.com/big-data-europe/docker-hive/tree/2.1.0-postgresql-metastore
What happened?
通过hdfswriter写入orc格式可以写入,但是通过hive查询时会提示ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException,此功能datax中是正常的
Version
4.1.3 (Default)
OS Type
Linux (Default)
Java JDK Version
Oracle JDK 1.8.0
Relevant log output
No response