Luck factor being computed many times for every participant

CodeWithMichal commented 2 years ago

Hi, I placed detailed explanation on canny, regarding my concerns: https://polkastarter.canny.io/bug-reports/p/in-depth-analysis-of-lottery-algorithm

To summarize: According to https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf luck factor should be computed only once for participant.

Whats wrong in current implementation: -Luck factor computed many times for every participant what makes it much less random -shuffled_eligible_participants method returns not unique list of participants, meaning some users are omitted

CodeWithMichal commented 2 years ago

@tiagom87 @miguelcma any update?

CodeWithMichal commented 2 years ago

@tiagom87 @miguelcma 27 days without any response

CodeWithMichal commented 2 years ago

@tiagom87 @miguelcma so you decided to remove whole bug section from canny instead of answering. 👍

miguelcma commented 2 years ago

Hey @LabuzzMichal , I'm really sorry for our delay. I didn't notice this issue somehow. Thank you for your review in detail into code.

If you take a look at LotteryService#calculate_winners you'll notice that uniqueness is ensured by the line lib/services/lottery_service.rb:86: winners.uniq.first(max_winners)

We're aware that this is not the most efficient algorithm (because it is creating a huge array in memory), but it's 100% correct. It was also audited by external entities/people more than once (both the code and the results) and all of them conclude the same correctness. A much more efficient way would be to use a ruby Set instead of Array, but we'll want a much more efficient algorithm: we're working on refactoring to a more mathematical approach instead of using Sets or Array. However, take into account that this refactor will only affect efficiency, not the accuracy of it, because it is correct already.

Thank again you for your thoughts

CodeWithMichal commented 2 years ago

Hi @miguelcma,

thank you for the update! I hope this time answering won't take that long :)

It was also audited by external entities/people more than once (both the code and the results) and all of them conclude the same correctness.

And now it was audited once again by myself and it failed. Please refer to my findings and answer them instead of saying "it was audited". And guys PLEASE treat it with due attention! This kind of bug might potentially lead to financial losses for those who act based on wrong assumptions

If you take a look at LotteryService#calculate_winners you'll notice that uniqueness is ensured by the line lib/services/lottery_service.rb:86: winners.uniq.first(max_winners)

Somehow from the whole wall of text on canny, you decided to refer to uniqueness, which I already said that IN THE END it's unique but it's still an issue! What about all other concerns I have?

@miguelcma Could you please reopen this issue and keep it open until we both agree on whether it's an issue or not? Let's discuss!

Let me repeat it again, this time in a much more technical manner. I want to start with lottery algorithm assumptions, then discuss a correct version of the algorithm for such a problem and at the end, I will show what exactly is wrong with the current implementation

Assumptions

There are N participants in the lottery
Every participant have M tickets
There are K lottery winners (entries)
Each participant might win only one entry

Weigted Random Sampling (WRS) algorithm

Mathematically this problem might be simplified to sampling K items from a population of N weighted items This algorithm is taken directly from this publication: link

possible ruby implementation:

  winners += weighted_random_sample
  winners.first(max_winners)

  def weighted_random_sample(participants)
    sorted = participants.sort_by do |participant|
      rand ** (1.0 / participant.tickets)
    end
    sorted.reverse
  end

Human-friendly code explanation:

Take a list of participants
For every participant compute some statistic value based on luck factor (rand) and number of tickets
Sort participants based on a value computed step earlier
Take N first entries from the sorted list

Your implementation

  winners += shuffled_eligible_participants
  winners.uniq.first(max_winners)

  def weighted_random_sample(participants)
    participants.max_by do |participant|
      rand ** (1.0 / participant.tickets)
    end
  end

  def shuffled_eligible_participants
    (max_winners * 5).times.map do
      weighted_random_sample(participants)
    end
  end

Human-friendly code explanation:

Run loop 5 x max_winners times
In every iteration choose one winner
In every iteration perform WRS algorithm but choose only participant with the highest computed value
After all iterations, remove not unique entries and take N first entries

Comparision and conclusions

Current implementation of the lottery algorithm is NOT a ticket-based algorithm. It is some variation but the results are NOT the same. You created a completely new algorithm. Is it somewhere described? Do you have any evidence it works?
Current implementation doesn't guarantee that N winners will be selected. That's because method shuffled_eligible_participants doesn't guarantee uniqueness meaning that in an extremely unlucky scenario shuffled_eligible_participants might return a list of ( 5 x max_winners) entries where there are only a few unique participants. Yes, I know it's extremely unlucky and yes I know thats why you are running loop 5 x max_winners times but there is still a possibility.
Current implementation "preselect" 5 x max_winners entries where some of them might be chosen many times (uniqueness not guaranteed). I don't get the logic behind that. With WRS algorithm uniqueness is guaranteed and you don't need to worry whether it's statistically correct or not
Current implementation favorites small wallets. Why? Because a number of tickets is not that important as it should

Final thoughts

You might say that this analysis is unfounded without real tests. Fine! Give me a few days to add PR with comparison in code. @miguelcma @tiagom87 kindly address the above

Edit: post updated

miguelcma commented 2 years ago

Thank you so much for your closer look on this. I'm reopening the issue as you suggest, to allow the discussion to happen. Let us analyse this in detail with the rest of the team, and also invite them to participate in the discussion.

CodeWithMichal commented 2 years ago

Hi @miguelcma ,

I was checking once again simulations I did and you were right - results are fine and your algorithm is working correctly. It is a little bit overcomplicated but results are fine. I made a mistake in my code, when I was checking it.

I feel ashamed and I am sorry for bothering you. Issue could be closed.

miguelcma commented 2 years ago

No worries at all @LabuzzMichal !

Discussion is great and having input from the community is crucial for quality, that we take very seriously. Not only to be able to improve the code, but also to make sure that everything is accurate. So, your input was very important 🙌 thank you

tiagom87 commented 2 years ago

Thank you for the proactivity @LabuzzMichal !

polkastarter / polkastarter-lottery