skent259 / crapssim

Simulator for craps with various betting strategies
MIT License
30 stars 14 forks source link

Random Module #34

Closed amortization closed 1 day ago

amortization commented 2 years ago

Are we married to using numpy.random for rolling the dice for some statistical reason, or can that be switched to using the standard python random library instead? I assume that if you need the dice roll result as a array you can use a numpy function to convert a tuple to an array.

I ask because it looks like numpy arrays aren't hashable so they can't be used for a dictionary key. It also isn't thread safe according to https://stackoverflow.com/questions/7029993/differences-between-numpy-random-and-random-random-in-python so for a multi threaded run (which would be seemingly much faster and was mentioned in #14) it wouldn't be usable. Also in most of our tests we use tuples as the type for dice rolls in fixed roll, however, in roll the result is a numpy array.

skent259 commented 2 years ago

Thanks for bringing this up. There's a couple levels to your question.

First, the new numpy implementation of it's random module uses the PCG algorithm for bit generation, which is know to be statistically better than the Mersenne Twister algorithm that the python random library uses. See https://www.pcg-random.org for some discussion. I think the stack overflow you link to has some outdated info, because it says that numpy uses Mersenne twister too, and that's no longer the case: https://numpy.org/doc/stable/reference/random/bit_generators/index.html

However, in looking further at this, I realized that the current implementation using randint() defaults to numpy's old random method, which is no better than python random. Statistically, this is not ideal and it should be changed to use the new algorithms:

rng = np.random.default_rng()
rng.integers(1, 7, size=2)

I'm not sure if the new numpy implementation works better with multi-threading. It seems it's possible to do in some fashion: https://numpy.org/doc/stable/reference/random/multithreading.html. As I understand, it only becomes unusable for multi threaded runs when a seed is set.

In terms of pros/cons, I'd rather have statistically valid random numbers than perfectly-working, seeded, multi-thread support at this time.

Where would the numpy arrays need to be used as dictionary keys? I think dice.fixed_roll() should still take any iterable, but I don't mind if we convert the output of rng.intergers() to an array or tuple if that's beneficial.

amortization commented 2 years ago

Where would the numpy arrays need to be used as dictionary keys? I think dice.fixed_roll() should still take any iterable, but I don't mind if we convert the output of rng.intergers() to an array or tuple if that's beneficial.

In theory I think that the fastest implementation of getting the outcome for bets is going to be a dictionary lookup of a hash of all variables with obviously the dice roll results being the main variable for most outcomes. It's also about mutability with numpy arrays being mutable and really the results of a dice roll should be immutable since results shouldn't change (the dice should change, but results shouldn't.) We're also going to run into issues if we ever want the data stored (JSON, SQL, XML etc.) as we would generally have to convert it to a hashable type for that support. In theory too, results couldn't (or probably shouldn't) be used in a Set, or as any sort of identifying key for other objects.

I actually see a ton of value in allowing a seeded group of rolls. regression testing would be as easy as running a seeded fixed roll prior to the change, and then running the same roll after the change ensuring the same outcomes.

I think the best answer is probably going to be storing the output of rng.integers as a tuple in order to get both the better randomness and the immutability/hashiness.

skent259 commented 2 years ago

To be clear, I see a lot of value in seeded groups of rolls (and it's easy to pass to default_rng()). The seeding only breaks down when you try to multi-thread things, and that I'm okay with not having a perfect solution for.

Storing the output as a tuple seems fine to me.

I'm envisioning something like this:

import typing
import numpy as np

class Dice:
    """
    Simulate the rolling of a dice

    Attributes
    ----------
    n_rolls : int
        Number of rolls for the dice
    result : array, shape = [2]
        Most recent outcome of the roll of two dice
    total : int
        Sum of dice outcome

    """
    def __init__(self, seed = None) -> None:
        self._result: typing.Iterable[int] | None = None
        self.n_rolls: int = 0
        self.rng = np.random.default_rng(seed)

    @property
    def total(self) -> int:
        return sum(self.result)

    @property 
    def result(self) -> typing.Iterable[int]:
        return self._result

    def roll(self) -> None:
        self.n_rolls += 1
        self._result = self.rng.integers(1, 7, size=2)

    def fixed_roll(self, outcome: typing.Iterable[int]) -> None:
        self.n_rolls += 1
        self._result = outcome
skent259 commented 1 day ago

Closed with #42 and 437970edf4b6027eebd133177633f7191de59a61