python / cpython

The Python programming language
https://www.python.org
Other
63.86k stars 30.56k forks source link

Pure Python implementation of random #60863

Closed serhiy-storchaka closed 11 years ago

serhiy-storchaka commented 11 years ago
BPO 16659
Nosy @brettcannon, @rhettinger, @mdickinson, @scoder, @alex, @serhiy-storchaka
Files
  • random_pure_python_3.patch
  • random_pure_python_5.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['type-feature', 'library'] title = 'Pure Python implementation of random' updated_at = user = 'https://github.com/serhiy-storchaka' ``` bugs.python.org fields: ```python activity = actor = 'alex' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = ['Library (Lib)'] creation = creator = 'serhiy.storchaka' dependencies = [] files = ['28283', '28303'] hgrepos = [] issue_num = 16659 keywords = ['patch'] message_count = 16.0 messages = ['177315', '177326', '177329', '177330', '177337', '177342', '177386', '177432', '178645', '178655', '178676', '178684', '178722', '183916', '187084', '187085'] nosy_count = 7.0 nosy_names = ['brett.cannon', 'rhettinger', 'mark.dickinson', 'scoder', 'Arfrever', 'alex', 'serhiy.storchaka'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = 'patch review' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue16659' versions = ['Python 3.4'] ```

    serhiy-storchaka commented 11 years ago

    C implemented part of random module does not depend on any specific C level abilities. Here is a patch which implements this part on pure Python. May be this is not a best implementation. And I don't know how write tests for both implementations.

    mdickinson commented 11 years ago

    I wonder whether it would make sense to use an array to hold the MT state, for a closer match with the C code. (Not sure whether this would have a noticeable effect on performance.)

    serhiy-storchaka commented 11 years ago

    Patch updated. Tests now test both implementation. Thank Ezio for tip.

    serhiy-storchaka commented 11 years ago

    I wonder whether it would make sense to use an array to hold the MT state, for a closer match with the C code.

    I don't think it makes sense. The algorithm is same and list is more natural for Python. Also arrays a little slower than lists, but it doesn't matter, because Python implementation of random slower C implementation anyway.

    brettcannon commented 11 years ago

    To see how to write tests that exercise both the C and Python versions look at test_heapq and test_warnings as examples. It will require some refactoring, but it's easy to do.

    serhiy-storchaka commented 11 years ago

    Yes, I used test_heapq and test_decimal as an example.

    serhiy-storchaka commented 11 years ago

    Patch updated. Comments and docstrings a little enhanced. Thanks Brett for review.

    Also Python implementation of core generator now is threadsafe (in particular random() and getrandbits() methods) as its C implementation and its private members now are more hidden (to prevent unintentional conflicts). See differences between patches for details.

    serhiy-storchaka commented 11 years ago

    Patch updated. One bug fixed.

    Also I made some benchmarks. The pure Python random() about 100 times slower and getrandbits(32) about 13 times slower than the C implementation.

    rhettinger commented 11 years ago

    Brett, if all the other Python implementations already have a working implementation of random, what purpose is served by adding a pure python version of the Mersenne Twister than won't be run by anyone?

    scoder commented 11 years ago

    FWIW, PyPy has an (R)Python implementation already:

    https://bitbucket.org/pypy/pypy/src/default/pypy/rlib/rrandom.py

    brettcannon commented 11 years ago

    In response to Raymond:

    First, Serhiy is a core developer now, so if he wants to commit this code and maintain it I have no objections as it doesn't detract from anything and the maintenance burden is his if he wants it. Whether any of us view it as the best use of his time or not is not our call and we can't stop him. =)

    Second, while PyPy may have an RPython implementation, it's originally from 2006, has already been patched by them twice in 2011 for bugs, and may not be needed by them anymore based on current performance characteristics of PyPy today in lieu of this code (and that's assuming they wrote the code in RPython originally for a specific reason compared to just needing something that worked, but this is all a guess w/o actually benchmarking).

    Third, I can't predict future VMs and their needs. It might not be used by a VM today (unless PyPy starts using it for their py3k work), but who knows what the future holds? As I said, Serhiy already wrote the code and is the core dev who will maintain it if it goes in so I don't see a maintenance burden here that is being hoisted upon python-dev.

    Fourth, I added a comment to issue bpo-16651 to state that people should see what the other VMs already have and to start a conversation first before moving forward with a Python port to make sure no one views it as a waste of time.

    serhiy-storchaka commented 11 years ago

    I don't want to make a decision on the inclusion of this code. However, I will undertake to maintain it. I'm going to fix one algorithmic bug in current implementation and add C implementations for some methods which significantly slowed in Python implementation. This can be done without the committing of this patch, but the dual test the two implementations will make the code more reliable. Even if Python implementation is not to be used, it will help in maintaining C implementation.

    rhettinger commented 11 years ago

    Okay thanks. I'll look over the patch in detail over the next few days.

    FYI, I'm the maintainer of this module (and was the one to add the original Mersenne Twister code).

    rhettinger commented 11 years ago

    [Serhiy Storchaka]

    I don't want to make a decision on the inclusion of this code.

    It was a tough call, but I don't want to add this code. You've a great job with it, but I don't think it is a worthwhile endeavor.

    -----------

    P.S. I did have a question about the patch.

    The code uses an RLock. Where are the places that can trigger reentracy?

    With in a single method, successive calls to _genrand_int32() are ordered and can't be interleaved with reentrancy while still keeping the original order of generated random numbers.

    brettcannon commented 11 years ago

    I was talking with Alex Gaynor about the Python implementation of operator (http://bugs.python.org/issue16694) and asked about this bug since Raymond said the fact PyPy had an RPython implementation was a knock against bothering with this. Alex said if performance could be shown to be good then PyPy would be willing to consider dropping their accelerated version and switch to this (http://bugs.python.org/issue16694). So if someone is so motivated, doing a benchmark showing whether PyPy's accelerated version (which is relatively new so you would need to probably grab 2.0 or maybe 1.9) is faster/same/slower than this pure Python version would be nice to have.

    alex commented 11 years ago

    Looking at the patch (haven't actually benchmarked it), I have two concerns with respect to performance:

    a) The need for locking, this doesn't exist in the C or RPython versions because of the GIL. That locking is going to be distinctly un-free. b) The need for manually masking overflowing arithmetic (yes I know, everything is a long, but just looking at it algorithmically, we really want the 2s complement).

    I don't have an opinion about how to solve either of these, but without a solution I doubt performance will ever be competitive. I think it would be a mistake to assume these issues are specific to this patch, they strike me as generally applicable issues.