lucidworks / hive-solr

Code to index Hive tables to Solr and Solr indexes to Hive
Apache License 2.0
47 stars 34 forks source link

Error loading into solr table from another hive table. #13

Open aftnix opened 8 years ago

aftnix commented 8 years ago
>sudo -u solr bin/solr create -c hiveCollection -d basic_configs -n hiveCollection -s 2 -rf 2
>hive>CREATE EXTERNAL TABLE authproc_syslog_solr (hid STRING, tstamp TIMESTAMP, type STRING, msg STRING, thost STRING, tservice STRING, tyear STRING, tmonth STRING, tday STRING) STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' LOCATION '/tmp/solr' TBLPROPERTIES('solr.zkhost' = 'hadoop1.openstacksetup.com:2181/solr', 'solr.collection'='hiveCollection', 'solr.query' = '*:*');

>hive>INSERT OVERWRITE TABLE authproc_syslog_solr SELECT s.* FROM authproc_syslog s;

Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:32, Vertex vertex_1473357519389_0194_6_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_14733575
19389_0194_6_00, diagnostics=[Task failed, taskId=task_1473357519389_0194_6_00_000009, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure wh
ile running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whi
le processing row
Caused by: java.lang.NullPointerException
        at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46)
        at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:184)
        at com.lucidworks.hadoop.hive.LWHiveOutputFormat$1.write(LWHiveOutputFormat.java:39)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:764)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:102)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)

The hive table and the hive_solr table have the exactly same schema.

aftnix commented 8 years ago

Turns out my table didn't have id as the first field. I fixed it. But now the INSERT never finishes( I waited couple of hours, reduced dataset etc, but the query never finishes.

Yarn logs contain these :

2016-09-26 17:24:21,082 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1474881768573_0002_2][Event:VERTEX_
FINISHED]: vertexName=Map 1, vertexId=vertex_1474881768573_0002_2_00, initRequestedTime=1474883498501, initedTime=1474883499061, startRequestedTi
me=1474883498577, startedTime=1474883499061, finishTime=1474889061031, timeTaken=5561970, status=KILLED, diagnostics=Vertex received Kill while i
n RUNNING state.
Vertex did not succeed due to DAG_KILL, failedTasks:0 killedTasks:3
Vertex vertex_1474881768573_0002_2_00 [Map 1] killed/failed due to:DAG_KILL, counters=Counters: 0, vertexStats=firstTaskStartTime=1474883503313, 
firstTasksToStart=[ task_1474881768573_0002_2_00_000001 ], lastTaskFinishTime=1474889061030, lastTasksToFinish=[ task_1474881768573_0002_2_00_000
002,task_1474881768573_0002_2_00_000001 ], minTaskDuration=-1, maxTaskDuration=-1, avgTaskDuration=-1.0, numSuccessfulTasks=0, shortestDurationTa
sks=[  ], longestDurationTasks=[  ], vertexTaskStats={numFailedTaskAttempts=0, numKilledTaskAttempts=0, numCompletedTasks=3, numSucceededTasks=0,
 numKilledTasks=3, numFailedTasks=0}

Don't know what's going wrong here :(

ctargett commented 8 years ago

Sorry for the delay of a few days to get back to you.

Are there any errors besides those messages?

Can you also share a little bit about your environment - it seems you're using Tez? What version/distro of Hive?

vishnucg commented 8 years ago

I am able to load data into solr external table from another managed hive table. But when I try to retrieve data from the solr table, it is throwing "Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String" I am using solr-hive-serde-2.2.6.jar on Hive 1.1.0-cdh5.4.5

acesar commented 8 years ago

@vishnucg can you please open a new issue with your question?

shazack commented 7 years ago

Did this issue get resolved? I'm getting the same error

shazack commented 7 years ago

I'm getting Caused by: java.lang.NullPointerException at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46) at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:190) ... 22 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":null,"_col1":null,"_col2":null,"_col3":null,"_col4":null,"_col5":null,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":null,"_col13":null,"_col14":null,"_col15":null,"_col16