Open Lot-Art opened 1 year ago
In case it has something to do with cutoff
, here is a strange behaviour. See how increasing the cutoff from 1,000 to 10,000 lowers the number of results.
Does option accurate_aggregation=1
help?
No it behaves the same.
It feels like without MATCH()
the result is nondeterministic?
I will do a workaround in the app: I will insert a gibberish word at the end of each document, and when the user searches without a keyword, the app will search for that gibberish word. That should return all documents (that would be the expected behavior in my opinion). I will post my finding here.
Please remove the filter by id an try again
Without id the result is the same as on the screenshot (incorrect count). Im running all these in the console. It's an RT index.
SIMPLE
SELECT count(*) FROM
la8_items_upcaftWHERE struct_category = 'modernart' OPTION cutoff=0;
15933
SELECT count(*) FROM
la8_items_upcaftWHERE struct_category = 'modernart' OPTION cutoff=1000;
60117
SELECT count(*) FROM
la8_items_upcaftWHERE struct_category = 'modernart' OPTION cutoff=10000
; 15933
SELECT count(*) FROM
la8_items_upcaftWHERE struct_category = 'modernart' OPTION cutoff=0, accurate_aggregation=1;
15933
SELECT count(*) FROM
la8_items_upcaftWHERE struct_category = 'modernart' OPTION cutoff=1000, accurate_aggregation=1;
60117
SELECT count(*) FROM
la8_items_upcaftWHERE struct_category = 'modernart' OPTION cutoff=10000, accurate_aggregation=1;
15933
GROUP BY
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=0;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=1000;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=10000;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=0, accurate_aggregation=1;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=1000, accurate_aggregation=1;
modernart | 60117
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=10000, accurate_aggregation=1;
modernart | 60117
(cutoff=10)
SELECT struct_category, count(*) FROM
la8_items_upcaftGROUP BY struct_category OPTION cutoff=10, accurate_aggregation=1;
modernart | 10557 (and only 10 lines are there as expected)
Please show the output of
SELECT struct_category, count(*) FROM la8_items_upcaft GROUP BY struct_category OPTION accurate_aggregation=1
Here it is (some docs has been added since my opening post, but the different counts remain the problem)
MySQL [(none)]> SELECT struct_category, count(*) FROM la8_items_upcaft GROUP BY struct_category OPTION accurate_aggregation=1;
+--------------------+----------+
| struct_category | count(*) |
+--------------------+----------+
| bookmanuscr | 15288 |
| silverware | 21259 |
| watchclock | 24968 |
| modernart | 60455 |
| antiqfurn | 39999 |
| jewel | 93624 |
| contempart | 54501 |
| misc | 385266 |
| ethnoart | 933 |
| car | 1419 |
| art_unknown_period | 30671 |
| designfurn | 20063 |
| collectibles | 41006 |
| asianart | 13578 |
| tableware | 17937 |
| carpetrug | 17353 |
| glassware | 9964 |
| arms | 6676 |
| oldmasterart | 9583 |
| ancientart | 703 |
+--------------------+----------+
20 rows in set (0.032 sec)
That is (maybe) correct. And then this is wrong
MySQL [(none)]> SELECT count(*) FROM `la8_items_upcaft` WHERE struct_category = 'modernart' OPTION accurate_aggregation=1;
+----------+
| count(*) |
+----------+
| 17571 |
+----------+
1 row in set (0.007 sec)
What if you add threads=1 to the both queries?
Here it is
MySQL [(none)]> SELECT count(*) FROM `la8_items_upcaft` WHERE struct_category = 'modernart' OPTION accurate_aggregation=1, threads=1;
+----------+
| count(*) |
+----------+
| 17773 |
+----------+
1 row in set (0.012 sec)
MySQL [(none)]> SELECT struct_category, count(*) FROM la8_items_upcaft GROUP BY struct_category OPTION accurate_aggregation=1, threads=1;
+--------------------+----------+
| struct_category | count(*) |
+--------------------+----------+
| bookmanuscr | 15290 |
| silverware | 21285 |
| art_unknown_period | 30767 |
| watchclock | 25014 |
| modernart | 60548 |
| antiqfurn | 40132 |
| jewel | 93737 |
| contempart | 54578 |
| misc | 386485 |
| ethnoart | 945 |
| car | 1419 |
| carpetrug | 17367 |
| designfurn | 20110 |
| collectibles | 41043 |
| asianart | 13607 |
| tableware | 17968 |
| oldmasterart | 9592 |
| glassware | 9993 |
| arms | 6690 |
| ancientart | 703 |
+--------------------+----------+
20 rows in set (0.065 sec)
I also tried with only threads
without accurate_aggregation
, same result
Looks really wrong. Can you share the table and your config with us by sending it to our write-only s3?
@Lot-Art Hi. Any news on this? We'd really like to look deeper into the issue, but we need your help.
Sorry for being absent, I was looking into giving it to you. This is in production, containing proprietary data from different sources. Sharing is a big legal hassle even with an NDA, so unfortunately I can’t send it to anyone, not even privately. But i will rebuild the index on Monday night and will tell you if that fixed it. Thank you for the help so far.
I see. Can you please check the table with indextool --check
before you rebuild it?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Feel free to re-open the issue in case it becomes actual.
Describe the bug I think the screenshot explains it better
To Reproduce
SELECT count(*) ... WHERE attrib='value'
and a 'SELECT attrib, count(*) ... WHERE id>0 GROUP BY attrib`Expected behavior Counts should be the same same
Describe the environment: