Closed serhiy-storchaka closed 11 years ago
C implemented part of random module does not depend on any specific C level abilities. Here is a patch which implements this part on pure Python. May be this is not a best implementation. And I don't know how write tests for both implementations.
I wonder whether it would make sense to use an array to hold the MT state, for a closer match with the C code. (Not sure whether this would have a noticeable effect on performance.)
Patch updated. Tests now test both implementation. Thank Ezio for tip.
I wonder whether it would make sense to use an array to hold the MT state, for a closer match with the C code.
I don't think it makes sense. The algorithm is same and list is more natural for Python. Also arrays a little slower than lists, but it doesn't matter, because Python implementation of random slower C implementation anyway.
To see how to write tests that exercise both the C and Python versions look at test_heapq and test_warnings as examples. It will require some refactoring, but it's easy to do.
Yes, I used test_heapq and test_decimal as an example.
Patch updated. Comments and docstrings a little enhanced. Thanks Brett for review.
Also Python implementation of core generator now is threadsafe (in particular random() and getrandbits() methods) as its C implementation and its private members now are more hidden (to prevent unintentional conflicts). See differences between patches for details.
Patch updated. One bug fixed.
Also I made some benchmarks. The pure Python random() about 100 times slower and getrandbits(32) about 13 times slower than the C implementation.
Brett, if all the other Python implementations already have a working implementation of random, what purpose is served by adding a pure python version of the Mersenne Twister than won't be run by anyone?
FWIW, PyPy has an (R)Python implementation already:
https://bitbucket.org/pypy/pypy/src/default/pypy/rlib/rrandom.py
In response to Raymond:
First, Serhiy is a core developer now, so if he wants to commit this code and maintain it I have no objections as it doesn't detract from anything and the maintenance burden is his if he wants it. Whether any of us view it as the best use of his time or not is not our call and we can't stop him. =)
Second, while PyPy may have an RPython implementation, it's originally from 2006, has already been patched by them twice in 2011 for bugs, and may not be needed by them anymore based on current performance characteristics of PyPy today in lieu of this code (and that's assuming they wrote the code in RPython originally for a specific reason compared to just needing something that worked, but this is all a guess w/o actually benchmarking).
Third, I can't predict future VMs and their needs. It might not be used by a VM today (unless PyPy starts using it for their py3k work), but who knows what the future holds? As I said, Serhiy already wrote the code and is the core dev who will maintain it if it goes in so I don't see a maintenance burden here that is being hoisted upon python-dev.
Fourth, I added a comment to issue bpo-16651 to state that people should see what the other VMs already have and to start a conversation first before moving forward with a Python port to make sure no one views it as a waste of time.
I don't want to make a decision on the inclusion of this code. However, I will undertake to maintain it. I'm going to fix one algorithmic bug in current implementation and add C implementations for some methods which significantly slowed in Python implementation. This can be done without the committing of this patch, but the dual test the two implementations will make the code more reliable. Even if Python implementation is not to be used, it will help in maintaining C implementation.
Okay thanks. I'll look over the patch in detail over the next few days.
FYI, I'm the maintainer of this module (and was the one to add the original Mersenne Twister code).
[Serhiy Storchaka]
I don't want to make a decision on the inclusion of this code.
It was a tough call, but I don't want to add this code. You've a great job with it, but I don't think it is a worthwhile endeavor.
The algorithm was designed with C level operations in mind and is awkward in Python.
The code we use for MT has been widely distributed and tested by others. We would just be getting away from the canonical reference version.
The other implementations already have an MT, so this code won't end-up being used and would just add to the maintenance burden. If anyone ever did use it, it would be dog slow.
-----------
P.S. I did have a question about the patch.
The code uses an RLock. Where are the places that can trigger reentracy?
With in a single method, successive calls to _genrand_int32() are ordered and can't be interleaved with reentrancy while still keeping the original order of generated random numbers.
I was talking with Alex Gaynor about the Python implementation of operator (http://bugs.python.org/issue16694) and asked about this bug since Raymond said the fact PyPy had an RPython implementation was a knock against bothering with this. Alex said if performance could be shown to be good then PyPy would be willing to consider dropping their accelerated version and switch to this (http://bugs.python.org/issue16694). So if someone is so motivated, doing a benchmark showing whether PyPy's accelerated version (which is relatively new so you would need to probably grab 2.0 or maybe 1.9) is faster/same/slower than this pure Python version would be nice to have.
Looking at the patch (haven't actually benchmarked it), I have two concerns with respect to performance:
a) The need for locking, this doesn't exist in the C or RPython versions because of the GIL. That locking is going to be distinctly un-free. b) The need for manually masking overflowing arithmetic (yes I know, everything is a long, but just looking at it algorithmically, we really want the 2s complement).
I don't have an opinion about how to solve either of these, but without a solution I doubt performance will ever be competitive. I think it would be a mistake to assume these issues are specific to this patch, they strike me as generally applicable issues.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['type-feature', 'library']
title = 'Pure Python implementation of random'
updated_at =
user = 'https://github.com/serhiy-storchaka'
```
bugs.python.org fields:
```python
activity =
actor = 'alex'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Library (Lib)']
creation =
creator = 'serhiy.storchaka'
dependencies = []
files = ['28283', '28303']
hgrepos = []
issue_num = 16659
keywords = ['patch']
message_count = 16.0
messages = ['177315', '177326', '177329', '177330', '177337', '177342', '177386', '177432', '178645', '178655', '178676', '178684', '178722', '183916', '187084', '187085']
nosy_count = 7.0
nosy_names = ['brett.cannon', 'rhettinger', 'mark.dickinson', 'scoder', 'Arfrever', 'alex', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue16659'
versions = ['Python 3.4']
```