ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
244 stars 58 forks source link

Using jsonlite::stream_in() handler argument in elastic::Search() #160

Closed nelsonSchwarz closed 7 years ago

nelsonSchwarz commented 7 years ago

Hi Sckott,

First off, great work with the elasic package! I especially enjoy the asdf argument in elastic::Search(). It really helps streamline formatting the output.

On that note, would it be possible to enable the handler argument from the jsonlite::stream_in() function, in order to further format and handle ES output? I can try it out myself and make a pull request if you'd like.

Thanks for your time, Nelson

nelsonSchwarz commented 7 years ago

Oops. I had not seen that Search uses fromJSON, not stream_in, my mistake. Nonetheless, is there any way of handling the data between batches like handler in stream_in, maybe in conjunction with scroll?

sckott commented 7 years ago

Thanks for the issue and for the kind words.

What is the use case for this? And is the current situation to slow perhaps?

nelsonSchwarz commented 7 years ago

Actually, I apologize for not taking a look at the Scrolling search - instead of paging section in your search vignette. The last while statement is exactly what I want to do.

Nonetheless, the idea was to format the JSON into a dataframe and further manipulate (i.e. dplyr) the data in batches. I am going to being dealing with a lot of data (i.e. dozens of GBs), and scrolling through all the data, to then manipulate in memory, is not feasible.

I'll see if it is slow or not, soon enough.

Sorry for the bother... but, at the same time, expect me bothering for the next couple of days XD

sckott commented 7 years ago

i see. looking at it. might be possible

sckott commented 7 years ago

@nelsonSchwarz reinstall and try again, see e.g., https://github.com/ropensci/elastic/blob/master/man-roxygen/search_egs.r#L818-L823 and https://github.com/ropensci/elastic/blob/master/R/scroll.R#L186-L210

sckott commented 7 years ago

i don't think stream_in makes a whole lot of sense here since it expects ndjson formatted data in a file, not regular JSON

i played around with this, and resulted in adding ability to use stream_out so you can instead of getting an R list, or raw JSON, you can write to disk as ndjson, and then use jsonlite::stream_in to read the ndjson into a data.frame

does that help?

sckott commented 7 years ago

sorted