Closed whs2k closed 3 years ago
thanks for your question @whs2k
is your main use case inserting data? or inserting and reading?
Main use case is inserting data into ES; reading would be a nice to have but is not the priority right now. FYI in PySpark, reading RDD's from elastic is handled by the newAPIHadoopRDD()
method
okay, thanks
Posting StackOverflow conversation here for viz: https://stackoverflow.com/questions/49141042/how-to-read-and-write-to-elasticsearch-with-sparkr/62385203#62385203
@whs2k so does that SO answer solve your problem? Or do you still hope for some solution with this package?
Is there a way to write a sparkR dataframe or RDD to ElasticSearch with multiple nodes?
This elastic package for R is great for normal interactions with ElasticSearch but says nothing about hadoop, distributed dataframes, or RDDs in SparkR 2.0+. When I try to use it I get the following errors:
If this were PySpark, I would use the
rdd.saveAsNewAPIHadoopFile()
function as shown here, but I can't find any information about it in SparkR from googling. ElasticSearch also has good documentation, but only for Scala and Java.Note that my elastic cluster has multiple nodes and, in Zeppelin, I am using the %spark2.r interpreter. This is a re-post of a SO question.