Can't use spark-batch-indexer

metamx / druid-spark-batch

Druid indexing plugin for using Spark in batch jobs

Apache License 2.0

101 stars 56 forks source link

Can't use spark-batch-indexer #74

Open benwck opened 8 years ago

benwck commented 8 years ago

Hey guys,

I don't want to spam in druid-development group thread, so i post here. I actually build the jar on my own with spark 1.6.1, add the jar to /druidpath/extensions/druid-spark-batch/ on both the overlord and middle manager. Added druid.indexer.task.defaultHadoopCoordinates=["org.apache.spark:spark-core_2.10:1.6.1"] in the two nodes runtime properties, restarted nodes then submit the job with json file. Still get: "error": "Could not resolve type id 'index_spark' into a subtype of [simple type, class io.druid.indexing.common.task.Task]\n at [Source: HttpInputOverHTTP@2cecd2f2; line: 54, column: 38]" Any idea or morde doc provided out of here ?

Thanks, Ben

benwck commented 8 years ago

Overlord raise the exception but i checked the log and the module is correctly loaded: 2016-06-23T15:03:43,693 INFO [main] io.druid.initialization.Initialization - Loading extension [druid-spark-batch] for class [io.druid.initialization.DruidModule]

drcrallen commented 8 years ago

Hi Ben, thanks for the information. A few things to double-check. Make sure you are using the https://github.com/metamx/druid-spark-batch/tree/druid0.9.0 branch for druid 0.9.0, and please make sure if you are using a middle manager that ALSO loads the extension properly.

benwck commented 8 years ago

Actually my middleManager doesn't load the extension. I made the same things on both nodes but it is not working on the middle manager. I will investigate and let you know. Many thanks !

benwck commented 8 years ago

I have the following logs on both middleManager and Overlord, still got the same error. Logs: 2016-06-28T08:14:11,916 INFO [main] io.druid.initialization.Initialization - Loading extension [druid-spark-batch] for class [io.druid.cli.CliCommandCreator] 2016-06-28T08:14:11,917 INFO [main] io.druid.initialization.Initialization - added URL[file:/home/ec2-user/druid-0.9.0/extensions/druid-spark-batch/druid-spark-batch_2.10.jar] Error: { "error": "Could not resolve type id 'index_spark' into a subtype of [simple type, class io.druid.indexing.common.task.Task]\n at [Source: HttpInputOverHTTP@7e42fe95; line: 54, column: 38]" }

Any suggestions on how investigate on this problem ?

drcrallen commented 8 years ago

@benwck Assuming there are no errors reported during the node startup, the only thing I could think of would be to do a test with just the overlord locally (where the runner is local rather than remote). And see if the overlord takes the task under that condition. That way you eliminate weird potential cross-node communication stuff

ImrulKayes commented 7 years ago

I am having a similar issue. As @drcrallen suggested I am running overload locally (i.e., single node cluster with all services running locally). I also built the jar with spark 1.6.1 and added jar to $DRUID_HOME/extensions/druid-spark-batch/ on both the overlord and middle managers. Also added druid.indexer.task.defaultHadoopCoordinates=["org.apache.spark:spark-core_2.10:1.6.1"] in runtime properties of overlord and middle managers. Then I used pull-deps to get org.apache.spark:spark-core_2.10:1.6.1 in $DRUID_HOME/hadoop-dependencies. When I started all the services overlord and middle managers started with no error logs. However, when I am submitting the job with JSON file I am getting the same error {"error":"Could not resolve type id 'index_spark' into a subtype of [simple type, class io.druid.indexing.common.task.Task]\n at [Source: HttpInputOverHTTP@2c648040; line: 1, column: 3426]"}. Any idea?

kosii commented 7 years ago

Make sure to submit your task to the /druid/indexer/v1/task endpoint, and not to /druid/indexer/v1/supervisor. I lost a lot of time because of this :D