ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Extract aggregated data from Elasticsearch #227

Closed aleksaschmidt closed 5 years ago

aleksaschmidt commented 5 years ago

Hi, I'm trying to request aggregated data from Elasticsearch in RStudio. My code:

aggs <- '{
     "aggs": {
        "my_buckets": {
          "composite": {
            "sources": [
              {"function": {"terms": {"field": "function"}}},
              {"session": {"terms": {"field": "session_id"}}},
              {"start": {"terms": {"field": "start_time"}}}
            ]
          }
        }
      },
        "query": {
          "bool": {
             "must": {"match_all": {}},
            "filter": {"terms": {"status": ["finish"]}}
          }
        }
}'
output <- Search(index = "functions_use", body = aggs, asdf = TRUE)$hits$hits
output

However, what I get it is all data, and not only these three attributes (function, session_id, start_time) that I defined in the aggs. What is need to be done to get only these three attributes?

I use Elasticsearch 6.2

sckott commented 5 years ago

session info please? the output of devtools::session_info()

sckott commented 5 years ago

thanks for your question @aleksaschmidt !

What do you mean by "I get it is all data"?

Any chance you can make a reproducible example with publicly available data? That would make it easier for me to help.

aleksaschmidt commented 5 years ago

Session Info:

 setting  value                       
 version  R version 3.4.2 (2017-09-28)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  C                           
 tz       <NA>                        
 date     2018-07-20  

I used now the shakespeare dataset.

aggs <- '{
    "aggs": {
       "my_buckets": {
           "composite": {
               "sources": [
               {"play_name": {"terms": {"field": "play_name"}}},
              {"speech_number": {"terms": {"field": "speech_number"}}}
               ]
            }
         }
      },
"query": {
    "bool": {
       "must": {"match_all": {}},
     "filter": {"terms": {"speaker": ["westmoreland"]}}
      }
    }
 }'
output <- Search(index = "shakespeare", body = aggs, asdf = TRUE)$hits$hits
output

And I got an Error: Unknown BaseAggregationBuilder [composite]. It seems that I cannot do composite aggregations. Actually, I am trying to filter the data by speaker and then I want to get only two attributes of this filtered data: play_name and speech_number.

sckott commented 5 years ago

thanks, will have a look

aleksaschmidt commented 5 years ago

Hi, I have found the way how I can get the filtered data. I used this script:

output <- Search(index = "shakespeare", q = {"speaker:westmoreland"}, body = '{"_source" : ["play_name", "speech_number"]}', asdf = TRUE)$hits$hits

sckott commented 5 years ago

Great. This https://github.com/ropensci/elastic/issues/227#issuecomment-406652239 is still a question though, correct?

aleksaschmidt commented 5 years ago

Yeap. I still want to know about composite aggregations.

sckott commented 5 years ago

@aleksaschmidt i'm not sure how to do composite aggregations - it should just be a JSON query thats passed to body - i'd search around Stackoverflow. sorry can't be of mor help on this