metamx / druid-spark-batch

Druid indexing plugin for using Spark in batch jobs
Apache License 2.0
101 stars 55 forks source link

Make indexer handle more RDD types #10

Open drcrallen opened 9 years ago

drcrallen commented 9 years ago

In general, it would be preferable if the RDD were passed as a parameter to the indexer, that way the indexing process can be separate from the initial formatting of the RDD and allow this code to be used as a library more easily.

arnaudbriche commented 7 years ago

Hi,

Is there any plan to make the project usable more as a library than a standalone job ?

Gauravshah commented 7 years ago

definitely agree with @drcrallen

Gauravshah commented 7 years ago

related branch: https://github.com/metamx/druid-spark-batch/compare/dataframes

Gauravshah commented 6 years ago

@drcrallen I would like to work on this. I think we can convert dataframe to smoosh format using this library. Although I was unable to figure out how would I let coordinator/overlord know that I have created smoosh files please pick it up, upload & update metadata. Any directions on that ?

leventov commented 6 years ago

@Gauravshah FYI https://groups.google.com/d/msg/druid-development/FK9Tz4TtKeQ/e7wH516BBAAJ