Open sunliwen opened 9 years ago
这还是一个很重要的计算,将时间跨度调整到30天重新开始计算,看是否与数据量相关。
15天得计算周期,就不会有失败了。 看来下面这个hive脚本还是有问题。
2014-12-23 08:11:26|INFO|HiveBased|HIVE_START calc_click_rec_buy
2014-12-23 08:11:53|CRITICAL|Batch Server|An Exception happened while running Job: <bound method HiveBasedStatisticsFlow.do_hive_based_calculations of <__main__.HiveBasedStatisticsFlow instance at 0x7f8b2dac7c68>>
Traceback (most recent call last):
File "/cube/apps/poco/poco/services/batch/server.py", line 155, in _execJob
callable()
File "/cube/apps/poco/poco/services/batch/server.py", line 202, in do_hive_based_calculations
connection, SITE_ID, self.getWorkDir(), backfilled_raw_logs_path)
File "/home/pocoweb/cube/apps/poco/poco/services/batch/statistics/hive_based_calculations.py", line 633, in hive_based_calculations
do_calculations(connection, site_id, work_dir, backfilled_raw_logs_path, client)
File "/home/pocoweb/cube/apps/poco/poco/services/batch/statistics/hive_based_calculations.py", line 615, in do_calculations
calc_click_rec_buy(site_id, connection, client)
File "/home/pocoweb/cube/apps/poco/poco/services/batch/statistics/hive_based_calculations.py", line 52, in wrapped_function
result = function(*arg, **kws)
File "/home/pocoweb/cube/apps/poco/poco/services/batch/statistics/hive_based_calculations.py", line 453, in calc_click_rec_buy
client.execute("INSERT OVERWRITE TABLE rec_buy "
File "pylib/hive_service/ThriftHive.py", line 63, in execute
self.recv_execute()
File "pylib/hive_service/ThriftHive.py", line 84, in recv_execute
raise result.ex
HiveServerException: HiveServerException(errorCode=9, message='Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask', SQLState='08S01')
2014-12-23 08:11:53|INFO|root|FlowEnd: HiveBasedStatisticsFlow
计算3775万条记录
2014-12-23 04:55:31|INFO|Backfiller|Count: 37750000, 3274.72780741 rows/sec
tac操作需要2个多小时。
有一条计算还是失败,但是看起来不是critical的计算。