manticoresoftware / manticoresearch-php

Official PHP client for Manticore Search
MIT License
171 stars 32 forks source link

Add a method to get total facets count #67

Open markomilivojevic opened 3 years ago

markomilivojevic commented 3 years ago

The library doesn't have a method to get total count of facets.

Here is the code example, imagine an index jobs with columns employer_id and employer_name:

$client = new \Manticoresearch\Client();
$index = $client->index('jobs');

$query = new \Manticoresearch\Query\BoolQuery();
$query->must(new \Manticoresearch\Query\MatchQuery('amazon', 'employer_name'));

$search = $index->search($query);
$search->limit(0);
$search->offset($offset);
$search->facet('employer_id', null, $per_page);
$result = $search->get();

var_dump($result->getTotal()); // this returns total number of rows, not total number of facets
var_dump($result->getFacets()); // this returns facets

So, adding a method similar to this one could solve the issue:

$result->getTotalFacets()
sanikolaev commented 3 years ago

Please elaborate more on why it's important. Can't you just calculate count of $result->getFacets() or did I get you wrong on what you mean by "total count of facets" ?

markomilivojevic commented 2 years ago

After experimenting with manticore json API, I realised that the issue belongs to manticoresearch core library rather than php client.

To clarify issue:

The code above will hit /json/search endpoint, and it will produce two queries on manticore side:

  1. SELECT * FROM jobs WHERE MATCH('{"bool":{"must":[{"match":{"employer_name":"amazon"}}]}}') LIMIT 0,0
  2. SELECT employer_id FROM jobs WHERE MATCH('{"bool":{"must":[{"match":{"employer_name":"amazon"}}]}}') GROUP BY employer_id WITHIN GROUP ORDER BY weight() desc ORDER BY @groupby desc LIMIT 0,30

And behind the scene, it executes show meta only for the first query, in order to get total number of found rows, which is useless in my case. What I need is show meta from the second query which will return total number of found employers.

To answer your question: $result->getFacets() returns limited number of facets, I set in variable $per_page. Total count of facets needs to be returned by /json/search endpoint.

sanikolaev commented 2 years ago

OK, I think I see your point. Please review and confirm I understood it right:

So the minimal reproducible example is:

mysql -P9306 -h0 -e "drop table if exists t; create table t(f text, a int); insert into t(a) values(1),(2),(1)";

No "total" returned in "aggregations":

curl -sX POST http://localhost:9308/search  -d '{"index":"t","limit":0,"offset":0,"aggs":{"a":{"terms":{"field":"a","size":10}}}}'|jq .
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 3,
    "hits": []
  },
  "aggregations": {
    "a": {
      "buckets": [
        {
          "key": 2,
          "doc_count": 1
        },
        {
          "key": 1,
          "doc_count": 2
        }
      ]
    }
  }
}

(should be 2 in this case).

and when the number of facets is limited it's impossible to know how many can be returned in total:

curl -sX POST http://localhost:9308/search  -d '{"index":"t","limit":0,"offset":0,"aggs":{"a":{"terms":{"field":"a","size":1}}}}'|jq .
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 3,
    "hits": []
  },
  "aggregations": {
    "a": {
      "buckets": [
        {
          "key": 2,
          "doc_count": 1
        }
      ]
    }
  }
}

In SQL it can be seen in SHOW META after select ... group by:

mysql> select * from t group by a;
+---------------------+------+------+
| id                  | a    | f    |
+---------------------+------+------+
| 1514488646869188929 |    1 |      |
| 1514488646869188930 |    2 |      |
+---------------------+------+------+
2 rows in set (0.00 sec)

mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 2     |
| total_found   | 2     |
| time          | 0.000 |
+---------------+-------+
3 rows in set (0.00 sec)

Since the JSON protocol doesn't return it, the php client can't return it too.

markomilivojevic commented 2 years ago

Yes, you understood it correctly.