manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.83k stars 489 forks source link

Bug on rand() function #1023

Open daw3gr opened 1 year ago

daw3gr commented 1 year ago

Left this potential bug on the manticore forum but unsure if it is unattended as there was no response.

I have a query for an index like this where for showing the behaviour seed is always 1:

select id, weight() as w, rand(1) as r, if ( (now() - fe) < 7776000, 100,0 ) as new, if ( (now() -fe_xtra) < 8640060 , 1000,0 ) as bumped, weight()(1+ new + bumped ) as composite_bumper from spv_101001001 WHERE pwt = 65 order by r desc limit 0, 20;

For the given dataset, If I ran the query a number of times, while r returns always the same values as expected, I get two different sets of results, and always two…

Result 1:

±-------±-----±-----------±-----±-------±-----------------+
| id | w | r | new | bumped | composite_bumper |
±-------±-----±-----------±-----±-------±-----------------+
| 99816 | 1 | 0.99983668 | 100 | 1000 | 1101 |
| 96770 | 1 | 0.99940014 | 0 | 0 | 1 |
| 93531 | 1 | 0.999071 | 0 | 0 | 1 |
| 101062 | 1 | 0.99886215 | 100 | 1000 | 1101 |
| 87562 | 1 | 0.99873304 | 100 | 1000 | 1101 |
| 102042 | 1 | 0.99870396 | 100 | 1000 | 1101 |
| 99630 | 1 | 0.99846452 | 100 | 1000 | 1101 |
| 100220 | 1 | 0.99795610 | 100 | 1000 | 1101 |
| 93560 | 1 | 0.997609 | 0 | 0 | 1 |
| 96801 | 1 | 0.99755889 | 0 | 0 | 1 |
| 98060 | 1 | 0.99746341 | 100 | 1000 | 1101 |
| 98071 | 1 | 0.99724239 | 100 | 1000 | 1101 |
| 98327 | 1 | 0.99717653 | 100 | 1000 | 1101 |
| 102389 | 1 | 0.99688619 | 100 | 1000 | 1101 |
| 88323 | 1 | 0.99667263 | 0 | 0 | 1 |
| 101370 | 1 | 0.99590409 | 100 | 1000 | 1101 |
| 95893 | 1 | 0.99539691 | 0 | 0 | 1 |
| 96771 | 1 | 0.99515986 | 0 | 0 | 1 |
| 93596 | 1 | 0.99505574 | 0 | 0 | 1 |
| 100857 | 1 | 0.99495226 | 100 | 1000 | 1101 |

Result2:

±-------±-----±-----------±-----±-------±-----------------+
| id | w | r | new | bumped | composite_bumper |
±-------±-----±-----------±-----±-------±-----------------+
| 99826 | 1 | 0.99983668 | 100 | 1000 | 1101 |
| 96836 | 1 | 0.99940014 | 0 | 0 | 1 |
| 93542 | 1 | 0.999071 | 0 | 0 | 1 |
| 101090 | 1 | 0.99886215 | 100 | 1000 | 1101 |
| 88039 | 1 | 0.99873304 | 0 | 0 | 1 |
| 102052 | 1 | 0.99870396 | 100 | 1000 | 1101 |
| 99640 | 1 | 0.99846452 | 100 | 1000 | 1101 |
| 100230 | 1 | 0.99795610 | 100 | 1000 | 1101 |
| 93576 | 1 | 0.997609 | 0 | 0 | 1 |
| 96841 | 1 | 0.99755889 | 100 | 0 | 101 |
| 98078 | 1 | 0.99746341 | 100 | 1000 | 1101 |
| 98081 | 1 | 0.99724239 | 100 | 1000 | 1101 |
| 98406 | 1 | 0.99717653 | 100 | 1000 | 1101 |
| 102399 | 1 | 0.99688619 | 100 | 1000 | 1101 |
| 88580 | 1 | 0.99667263 | 0 | 0 | 1 |
| 101380 | 1 | 0.99590409 | 100 | 1000 | 1101 |
| 95912 | 1 | 0.99539691 | 100 | 1000 | 1101 |
| 96837 | 1 | 0.99515986 | 0 | 0 | 1 |
| 93619 | 1 | 0.99505574 | 0 | 0 | 1 |
| 100867 | 1 | 0.99495226 | 100 | 1000 | 1101 |

The results should be unique, this prevents consistent pagination through results.

I have run the same query with different seeds and what I can see is that the id for the first result is always + or - 10 of the other set of results, for some of the remaining results this is also the case, in this example the last result exhibits the same behaviour.

Version info:

Server version: 5.0.2 348514c@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522) git branch HEAD (no branch)

One other thing that makes me suspect that its a big is that if I modify the search a bit, then, occasionally, I get rand() to be 0.0000 for some or all of the results as follows:

select id, weight() as w, rand(), if ( (now() - fe) < 7776000, 100,0 ) as new, if ( (now() -fe_xtra) < 8640060 , 1000,0 ) as bumped, weight()(1+ new + bumped ) as composite_bumper from spv_101001001 WHERE pwt = 65 limit 0, 20 option rand_seed=1;

±------±-----±-----------±-----±-------±-----------------+
| id | w | rand() | new | bumped | composite_bumper |
±------±-----±-----------±-----±-------±-----------------+
| 13806 | 1 | 0.12649618 | 0 | 0 | 1 |
| 17852 | 1 | 0.20872745 | 0 | 0 | 1 |
| 31937 | 1 | 0.46578056 | 0 | 0 | 1 |
| 48448 | 1 | 0.62003821 | 0 | 0 | 1 |
| 52590 | 1 | 0.87326658 | 0 | 0 | 1 |
| 52598 | 1 | 0.06641738 | 0 | 0 | 1 |
| 55667 | 1 | 0.993506 | 100 | 1000 | 1101 |
| 55668 | 1 | 0.91852927 | 100 | 1000 | 1101 |
| 56108 | 1 | 0.93055731 | 100 | 1000 | 1101 |
| 56109 | 1 | 0.17995529 | 100 | 1000 | 1101 |
| 63315 | 1 | 0.000000 | 100 | 1000 | 1101 |
| 63316 | 1 | 0.000000 | 100 | 1000 | 1101 |
| 64423 | 1 | 0.000000 | 100 | 0 | 101 |
| 71741 | 1 | 0.000000 | 100 | 1000 | 1101 |
| 71823 | 1 | 0.000000 | 0 | 0 | 1 |
| 72927 | 1 | 0.000000 | 0 | 0 | 1 |
| 73026 | 1 | 0.000000 | 0 | 0 | 1 |
| 74769 | 1 | 0.000000 | 0 | 0 | 1 |
| 75013 | 1 | 0.000000 | 0 | 0 | 1 |
| 75729 | 1 | 0.000000 | 100 | 1000 | 1101 |
sanikolaev commented 1 year ago

Try adding option threads=1. Does it help?

daw3gr commented 1 year ago

Hi @sanikolaev, thanks, it does indeed, repeated the queries a number of times and now the results are consistent at all times. What is the cause of this behaviour? Thanks again.

sanikolaev commented 1 year ago

As said in the [docs](https://manual.manticoresearch.com/Functions/Mathematical_functions#RAND()):

If you use the seed take into account that it resets rand()'s starting point separately for each plain index / RT disk / RAM chunk / pseudo shard, so queries to a distributed index in any form can return multiple identical random values.

I'm not sure about the zeroes though. Is there a simple way to reproduce them with a synthetic case like this?

create table ...
insert into ...
select ...
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Feel free to re-open the issue in case it becomes actual.