ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Problem using the bulk load API #117

Closed sckott closed 8 years ago

sckott commented 8 years ago

from a user:

I am actually trying to use the 0.6.0 version of the Elastic package and I face some difficulties with it. I want to put my data from an R database into Elastic Search and I tried two ways to do it:

  • Firstly, build a JSON file from my R database but, trying to do so, I am just able to make a JSON file which can not be read by the docs_bulk function as it must have the same format as your plos_data or shakespeare_data Json files and I am not able to reach the same format. Can you for example give the R code to a Json file from the mtcars or Iris dataframe which can be read by the docs_bulk function?
  • Secondly, I tried to put directly my database into the docs_bulk function but doing so, the Search function doesn't work properly for me. For example, using your example command docs_bulk(mtcars, index = "hello", type = "world"), the command Search(index="hello", type = "world", q="14.7", size=10)$hits$hits yields 10 results instead of 1 (the 10 results refer to the same observation in the dataframe). How can I fix the problem?
sckott commented 8 years ago

1st question:

The elasticsearch bulk API requires a weird format. It's not proper JSON. Each line ends with a newline, which JSON does not do. I have a few internal functions in the package to create the bulk load format, but for specific data sources - here they are https://github.com/ropensci/elastic/blob/master/R/docs_bulk.r#L298-L339 - I don't think they can really be generalized - but you can modify them easily for your own data.

sckott commented 8 years ago

2nd question:

That example works for me, and returns only 1 result.

library("elastic")
connect()
docs_bulk(mtcars, index = "hello", type = "world")
Search(index="hello", type = "world", q="14.7", size=10)$hits$total
#> [1] 1

What do you get when you try that example? Make sure that the hello index is not already present. Do index_recreate("hello") for example before the docs_bulk() command

sckott commented 8 years ago

closing - just answering email based question here