ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Fix bulk update with bool field #240

Closed dpmccabe closed 5 years ago

dpmccabe commented 5 years ago

Fixes #239: In docs_bulk_update boolean fields get cast as strings. Just need to run as.list on each data.frame row, not on the entire data.frame.

codecov-io commented 5 years ago

Codecov Report

Merging #240 into master will not change coverage. The diff coverage is 0%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #240   +/-   ##
=====================================
  Coverage       0%     0%           
=====================================
  Files          43     43           
  Lines        1502   1502           
=====================================
  Misses       1502   1502
Impacted Files Coverage Δ
R/docs_bulk_update.R 0% <0%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f8a8d1c...448f0b8. Read the comment docs.

dpmccabe commented 5 years ago

I think there might be more to this fix in that the new code doesn't work with nested data. Looking into it...

sckott commented 5 years ago

you mean a list inside a data.frame?

dpmccabe commented 5 years ago

Yes, running as.list on each row messes up list columns. make_bulk doesn't have any issues here because it just calls 'toJSON` on the data.frame.

sckott commented 5 years ago

right, the update fxn is a little different as we need to set the id to null and set the doc as upsert field

sckott commented 5 years ago

if no test, can you at least show an example here with nested data

dpmccabe commented 5 years ago

My latest (hopefully final) commit should fix this. I think you actually need dplyr as a dependency if want want to test this. Here's some example data

> df
# A tibble: 3 x 3
  id                                       title                event_instances 
  <chr>                                    <chr>                <list>          
1 f9dd4b9105a8076c23d998897906e95f8dad33ee CCB Safety Training  <tibble [1 × 4]>
2 6b54d6f20c1b4c047658f4b3c78daf6dca81b6a6 Frankenreads         <tibble [1 × 4]>
3 42df3e4c8e743e927531f31d9ecd203e5bdb505e Nine Moments for Now <tibble [4 × 4]>

> df$event_instances
[[1]]
# A tibble: 1 x 4
  all_day dt                        end_dt location_text
  <lgl>   <chr>                     <chr>  <chr>        
1 NA      2017-12-20 14:00:00 -0400 NA     NA           

[[2]]
# A tibble: 1 x 4
  all_day dt                        end_dt                    location_text                                                                  
  <lgl>   <chr>                     <chr>                     <chr>                                                                          
1 FALSE   2018-10-31 09:00:00 -0400 2018-10-31 16:00:00 -0400 "Houghton Library, Edison and Newman Room\nQuincy St. & Harvard St., Cambridge"

[[3]]
# A tibble: 4 x 4
  all_day dt                        end_dt                    location_text                                                                                      
  <lgl>   <chr>                     <chr>                     <chr>                                                                                              
1 FALSE   2018-10-31 10:00:00 -0400 2018-10-31 17:00:00 -0400 "The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"
2 FALSE   2018-11-01 10:00:00 -0400 2018-11-01 17:00:00 -0400 "The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"
3 FALSE   2018-11-02 10:00:00 -0400 2018-11-02 17:00:00 -0400 "The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"
4 FALSE   2018-11-03 10:00:00 -0400 2018-11-03 17:00:00 -0400 "The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"

The new make_bulk_update function generates this:

{"update":{"_index":"events","_type":"events","_id":"f9dd4b9105a8076c23d998897906e95f8dad33ee"}}
{"doc":{"title":"CCB Safety Training","event_instances":[{"all_day":null,"dt":"2017-12-20 14:00:00 -0400","end_dt":null,"location_text":null}]},"doc_as_upsert":true}
{"update":{"_index":"events","_type":"events","_id":"6b54d6f20c1b4c047658f4b3c78daf6dca81b6a6"}}
{"doc":{"title":"Frankenreads","event_instances":[{"all_day":false,"dt":"2018-10-31 09:00:00 -0400","end_dt":"2018-10-31 16:00:00 -0400","location_text":"Houghton Library, Edison and Newman Room\nQuincy St. & Harvard St., Cambridge"}]},"doc_as_upsert":true}
{"update":{"_index":"events","_type":"events","_id":"42df3e4c8e743e927531f31d9ecd203e5bdb505e"}}
{"doc":{"title":"Nine Moments for Now","event_instances":[{"all_day":false,"dt":"2018-10-31 10:00:00 -0400","end_dt":"2018-10-31 17:00:00 -0400","location_text":"The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"},{"all_day":false,"dt":"2018-11-01 10:00:00 -0400","end_dt":"2018-11-01 17:00:00 -0400","location_text":"The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"},{"all_day":false,"dt":"2018-11-02 10:00:00 -0400","end_dt":"2018-11-02 17:00:00 -0400","location_text":"The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"},{"all_day":false,"dt":"2018-11-03 10:00:00 -0400","end_dt":"2018-11-03 17:00:00 -0400","location_text":"The Ethelbert Cooper Gallery of African and African American Art\n102 Mount Auburn St.\nCambridge"}]},"doc_as_upsert":true}

This should work recursively for any level of nested data.

sckott commented 5 years ago

Maybe a reproducible example?

Did you actually test this code? if you install the package on your branch, then use it, it can't find unbox() from jsonlite -

make sure to test it pleae

dpmccabe commented 5 years ago

I should have some time this week to write some unit tests. Will add to this pull req.

sckott commented 5 years ago

any thoughts @dpmccabe ?

dpmccabe commented 5 years ago

Sorry for going silent. I'll add some unit tests to this PR by this weekend.

sckott commented 5 years ago

thanks

dpmccabe commented 5 years ago

Unit test added (and passing).

sckott commented 5 years ago

thanks, having a look

sckott commented 5 years ago

LGTM