Open drcrallen opened 9 years ago
Hi,
Is there any plan to make the project usable more as a library than a standalone job ?
definitely agree with @drcrallen
related branch: https://github.com/metamx/druid-spark-batch/compare/dataframes
@drcrallen I would like to work on this. I think we can convert dataframe to smoosh format using this library. Although I was unable to figure out how would I let coordinator/overlord know that I have created smoosh files please pick it up, upload & update metadata. Any directions on that ?
In general, it would be preferable if the RDD were passed as a parameter to the indexer, that way the indexing process can be separate from the initial formatting of the RDD and allow this code to be used as a library more easily.