pdphilip / laravel-elasticsearch

Laravel Elasticsearch: An Elasticsearch implementation of Laravel's Eloquent ORM
MIT License
86 stars 16 forks source link

composite aggregation with the after_key to paginate the large documents #35

Closed DevYSM closed 4 weeks ago

DevYSM commented 1 month ago

Hello, I'm trying to use the rawAggregation everything working good, But when we're using the composite aggregation to paginate the result of large data we must get require param name is after that's to point the elastic in the next request the start point. How can I get it?

   'composite' => [
                        'size' => $per_page,
                        'sources' => [
                            [
                                'entity_id' => [
                                    'terms' => [
                                        'field' => 'entity_id',
                                    ],
                                ],
                            ],
                        ],
                        'after' => [
                            'entity_id' => $after_key,
                        ],
                    ],

The above example of the query to elastic

and this is example of the current response

"agg_by_entity_id" => array:10 [▼
      0 => array:7 [▼
        "key" => array:1 [▶]
        "doc_count" => 1
        "total_calls" => array:1 [▶]
        "total_whatsapp" => array:1 [▶]
        "total_visits" => array:1 [▶]
        "entity_details" => array:1 [▶]
        "total_chat" => array:1 [▶]
      ]

This is the expected behavior

array:1 [▼ // vendor/pdphilip/elasticsearch/src/DSL/Bridge.php:1098
  "agg_by_entity_id" => array:2 [▼
    "after_key" => array:1 [▼
      "entity_id" => 2693535.0
    ]
    "buckets" => array:10 [▼
      0 => array:7 [▶]
      1 => array:7 [▶]
      2 => array:7 [▶]
      3 => array:7 [▶]
      4 => array:7 [▶]
      5 => array:7 [▶]
      6 => array:7 [▶]
      7 => array:7 [▶]
      8 => array:7 [▶]
      9 => array:7 [▶]
    ]
  ]
DevYSM commented 1 month ago

@pdphilip Can you check this it's very important to paginate the results

DevYSM commented 1 month ago

I have edited this method to implement the expected behavior

image

pdphilip commented 1 month ago

Hey @DevYSM ,

I'm not sure what you mean by paginating results for aggregations. Is this related to issue #34 ?

Please share the full code that you're trying to implement so that I can attempt to recreate (and understand) what you're trying to do. And link back to the ES docs if you can.

Thanks

DevYSM commented 1 month ago

Hey @pdphilip, Thank you for your interest, You can check the Elasticsearch documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html

Let me explain the meaning of composite aggregation it's used to paginate the largest documents aggregation in my case I have 20M+ of documents if I'm using the normal aggregation it's taking long time.

The composite aggregation is magic because I'm getting for each request 1000 of documents.

and what's the after_key meaning?

We're using the after_key to pointer of Elasticsearch to start after the latest ID sent in the last request.

And really, Thank you for your great package.

pdphilip commented 1 month ago

Ok, and share you full code (that's failing) please so I can see it in context. Thanks

DevYSM commented 1 month ago

Which code you need to see?

Ok, and share you full code (that's failing) please so I can see it in context. Thanks

DevYSM commented 1 month ago

This is the query I have to use.

$array = [
    'size' => 0,
    'query' => [
        'bool' => [
            'must' => [
                [
                    'range' => [
                        'day' => [
                            'gte' => '2024-08-07',
                            'lte' => '2024-09-07',
                            'format' => 'yyyy-MM-dd',
                        ],
                    ],
                ],
                [
                    'term' => [
                        'country_alias' => 'kw',
                    ],
                ]
            ],
        ],
    ],
    'aggs' => [
        'agg_by_entity_id' => [
            'composite' => [
                'size' => 1000,
                'sources' => [
                    [
                        'entity_id' => [
                            'terms' => [
                                'field' => 'entity_id',
                            ],
                        ],
                    ],
                ],
                'after' => [
                    'entity_id' => '8653612',
                ],
            ],
            'aggs' => [
                'entity_details' => [
                    'top_hits' => [
                        'size' => 1,
                        '_source' => [
                            'includes' => [
                                'entity_id',
                                'entity_type',
                                'entity_title',
                                'country_alias',
                                'main_taxonomy',
                                'main_taxonomy_title',
                                'owner_phone',
                                'created_at',
                                'updated_at',
                            ],
                        ],
                    ],
                ],
                'total_visits' => [
                    'sum' => [
                        'script' => [
                            'lang' => 'painless',
                            'source' => "doc['visit_ios_count'].value + doc['visit_android_count'].value + doc['visit_huawei_count'].value + doc['visit_web_count'].value",
                        ],
                    ],
                ],
                'total_calls' => [
                    'sum' => [
                        'script' => [
                            'lang' => 'painless',
                            'source' => "doc['call_ios_count'].value + doc['call_android_count'].value + doc['call_huawei_count'].value + doc['call_web_count'].value",
                        ],
                    ],
                ],
                'total_whatsapp' => [
                    'sum' => [
                        'script' => [
                            'lang' => 'painless',
                            'source' => "doc['whatsapp_ios_count'].value + doc['whatsapp_android_count'].value + doc['whatsapp_huawei_count'].value + doc['whatsapp_web_count'].value",
                        ],
                    ],
                ],
                'total_chat' => [
                    'sum' => [
                        'script' => [
                            'lang' => 'painless',
                            'source' => "doc['chat_ios_count'].value + doc['chat_android_count'].value + doc['chat_huawei_count'].value + doc['chat_web_count'].value",
                        ],
                    ],
                ],
            ],
        ] 
    ],
];

And this the expected behavior

  "entity_details" => array:2 [▼
          "after_key" => array:1 ["entity_id" => 2693535]
          "buckets" => array:10 [ 
            0 => array:7 [▶]
            1 => array:7 [▶]
            2 => array:7 [▶]
            3 => array:7 [▶]
            4 => array:7 [▶]
            5 => array:7 [▶]
            6 => array:7 [▶]
            7 => array:7 [▶]
            8 => array:7 [▶]
            9 => array:7 [▶]
          ]
pdphilip commented 1 month ago

PHP/Laravel code using this package, so I can see your example.

Give as much details about your model as possible please

DevYSM commented 1 month ago

Good morning, Do you need more codes to understand what I mean?

pdphilip commented 4 weeks ago

Hi @DevYSM , update to the latest release and try the rawAggregation now. The update fixed the issue where only 1 agg result was being returned.

However, given the complexity of your query it may not return all the data you need. If that's the case then this update also has an upgrade on rawSearch() where you can add your body with a second parameter set to true to return the full unsanitized result. This will give you what you need, ex:

YourModel::rawSearch($body,true)

Good luck!

DevYSM commented 4 weeks ago

Great Update, Thank you it's working fine <3