Closed nelsonSchwarz closed 7 years ago
Oops. I had not seen that Search
uses fromJSON
, not stream_in
, my mistake. Nonetheless, is there any way of handling the data between batches like handler
in stream_in
, maybe in conjunction with scroll
?
Thanks for the issue and for the kind words.
What is the use case for this? And is the current situation to slow perhaps?
Actually, I apologize for not taking a look at the Scrolling search - instead of paging section in your search vignette. The last while
statement is exactly what I want to do.
Nonetheless, the idea was to format the JSON into a dataframe and further manipulate (i.e. dplyr) the data in batches. I am going to being dealing with a lot of data (i.e. dozens of GBs), and scrolling through all the data, to then manipulate in memory, is not feasible.
I'll see if it is slow or not, soon enough.
Sorry for the bother... but, at the same time, expect me bothering for the next couple of days XD
i see. looking at it. might be possible
@nelsonSchwarz reinstall and try again, see e.g., https://github.com/ropensci/elastic/blob/master/man-roxygen/search_egs.r#L818-L823 and https://github.com/ropensci/elastic/blob/master/R/scroll.R#L186-L210
i don't think stream_in
makes a whole lot of sense here since it expects ndjson formatted data in a file, not regular JSON
i played around with this, and resulted in adding ability to use stream_out
so you can instead of getting an R list, or raw JSON, you can write to disk as ndjson, and then use jsonlite::stream_in
to read the ndjson into a data.frame
does that help?
sorted
Hi Sckott,
First off, great work with the
elasic
package! I especially enjoy theasdf
argument in elastic::Search(). It really helps streamline formatting the output.On that note, would it be possible to enable the
handler
argument from the jsonlite::stream_in() function, in order to further format and handle ES output? I can try it out myself and make a pull request if you'd like.Thanks for your time, Nelson