Closed dfcc25ac-e242-4495-8f49-176193f06a74 closed 7 years ago
When the uuid.py module is simply imported it has the side effect of forking a subprocess (/sbin/ldconfig) and doing a lot of stuff find a uuid implementation by ctypes. This is undesirable in many contexts. It would be better to perform those tasks on demand, when the first UUID is actually requested. In general, imports should avoid unnecessary system call side effects. This also makes testing easier.
I think launching external tools like ifconfig and ipconfig can be avoided pretty easily. There are many recipes around the net how to use native API's. About ctypes' horrible logic during find_library call - don't know yet.
With the attached patch the "heavy work" will be done on request, when calling uuid1() or uuid4() not on import.
I am working off from the py3k svn branch. Is it necessary to submit a separate patch for py2 branch?
Kenny, I don't see a problem with uuid is *imported*, it just creates a couple of STANDARD UUID class objects for use later. And this seems to just set the number and validates it. I don't see any subprocess calls performed. Perhaps you were referring to scenarios of using uuid1/uuid5 methods in mac and suggesting improvements to it by your patch?
If you do 'python -c "import uuid" under strace, _posixsubprocess is definitely loaded, and a pipe2 call is made.
Take a look at the code starting at (py3k trunk) line 418 (try:). That's where the weird stuff happens, which is what the patch is addressing.
Ken: thanks for working on this.
Thanks for posting a patch! I have two comments:
uuid should work even when ctypes is not available A bit of offtopic: why can't we assume that ctypes is available?
A bit of offtopic: why can't we assume that ctypes is available?
Because ctypes (or, actually, the libffi it relies on) needs specific low-level code for each platform it runs on, and not all platforms have such code.
Another reason is that ctypes is dangerous and some administrators might prefer to disable it (especially on shared hosting ala Google App Engine).
Thanks for pointing that out! I guess that is the reason you did the import in a try block.
Maybe I understood and ctypes ImportError simply must be handled and fallbacked to something else. But there are only 3 ways of getting MAC address:
And ctypes seems to be the best choice: it's portable across Python VMs (better that 3) and faster (better than 1).
Maybe I understood and ctypes ImportError simply must be handled and fallbacked to something else.
Indeed.
But there are only 3 ways of getting MAC address:
- using popen
- using ctypes and native calls
- using C API and performing native calls in extension
And ctypes seems to be the best choice: it's portable across Python VMs (better that 3) and faster (better than 1).
Perhaps, but doing without ctypes should still be possible, otherwise it's a regression.
It's also possible using existing wrapped os system calls. One exaple is here: http://code.google.com/p/pycopia/source/browse/trunk/aid/pycopia/ifconfig.py
Although that one doesn't current support MAC addresses, but it could. The socket module also now support the netlink socket on Linux, so it shouldbe possible to use that for getting MAC address on Linux.
It's also possible using existing wrapped os system calls. That's right, on linux we can use ioctls but windows would require win api calls like this one: http://stackoverflow.com/questions/166506/finding-local-ip-addresses-in-python/166992#166992
I'm thinking Python could use a general purpose ifconfig/mac-layer module that uuid.py could then just use.
I'm thinking Python could use a general purpose ifconfig/mac-layer module that uuid.py could then just use.
Perhaps, but that's really out of scope for this issue. Feel free to open another issue.
The patch hasn’t incorporated Antoine’s comments AFAICT.
Also I don’t see this fit for back porting to bug fix releases. Correct me if I’m wrong.
Hynek, you are right.
The implementation does not actually end up in infinite loop, just repeating the loading of the CDLL is slow.
I added caching to that and fixed the ctypes imports too.
Jyrki, roundup doesn’t seem to recognize you patch so we can’t review it in Rietveld. Could you re-try, maybe using hg?
Here's a second take on the patch
One thing I am wondering about is why we have to use find_library() at all. Wouldn’t it be more robust, and more efficient, to call CDLL() directly? We just have to know the exactly library version we are expecting. On Linux, the full soname is libuuid.so.1. It seems on OS X it is called libc.dylib (but it would be good for someone else to confirm).
# The uuid_generate_* routines are provided by libuuid on at least
# Linux and FreeBSD, and provided by libc on Mac OS X.
if sys.platform == "darwin":
libname = "libc.dylib"
else:
libname = "libuuid.so.1"
_ctypes_lib = ctypes.CDLL(libname)
I cannot comment on uuid directly, but for me, this is yet another example of how assumptions can break things.
imho - if you know the exact version of s shared library that you want, calling cdll directly should be find. Maybe find_library is historic.
However, an advantage to calling find_library is that it should lead to an OSError if it cannot be loaded. But trapping OSError on a call to ctypes.cdll or testing for None (NULL) from find_library() is the option of a developer using the ctypes interface.
As far as libFOO.so.N - do you always want to assume it is going to be version N, or are you expecting to be able to work work version N+X? Again, find_library() can give you the library name it would load - but a programmer must be aware of the platform differences (e.g., AIX returns not a filename, but a libFOO.a(libFOO.so.N) - as just one example. p.s. as far as ldconfig being part of the problem on AIX (as it does not exist and can lead to OSError or just long delays, I have worked out a (pending) patch for ctypes/util (and ctypes/cdll) that may address the issues with uuid import on AIX. (see bpo-26439 for the patch)
The versioning problem with libFOO.so.N already occurs with compiled programs. A C program compiled against libuuid.so.1 will fail to load if you only have libuuid.so.2. On the other hand, a Python program using find_library() will find either version. My point about robustness is that if a version 2 is invented, it might have different semantics or function signatures, and Python would then be assuming the wrong semantics.
It sounds like ctypes is causing you some headache. How about we get rid of ctypes for uuid and replace it with a simple implementation in C instead? Autoconf (configure) can take care of the library detection easily.
There is already bpo-20519 for that, although it looks like the proposed patch keeps ctypes as a fall-back. (My interest here is only theoretical.)
The way I have seen that resolved - in many locations, is to have the option to specify a specific version, e.g., libFOO.so.1 and then as long as that version remains available, perhaps as read-only for running with existing programs,they continue to work even when libFOO.so.2. However, for some who believes, or expects, to be able to deal with potential ABI changes - they can specify simply libFOO.so and let the loader decide (what I see is that libFOO.so is a symbolic link to, or a copy of the latest version.)
So, I agree wholeheartedly, if a versioned number is requested, that, or nothing, should be returned. However, if it is generic - try to find a generic named library (and get the link or copy), and when that is not available either, take the latest versioned number.
It has been over two months, and I may have read it wrong - but that appears to be what the current "ldconfig -p" solution implements. (in ctypes, not uuid, so perhaps this is not the correct place to be responding. If so, my apologies).
On 08-May-16 05:24, Martin Panter wrote:
Martin Panter added the comment:
The versioning problem with libFOO.so.N already occurs with compiled programs. A C program compiled against libuuid.so.1 will fail to load if you only have libuuid.so.2. On the other hand, a Python program using find_library() will find either version. My point about robustness is that if a version 2 is invented, it might have different semantics or function signatures, and Python would then be assuming the wrong semantics.
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue11063\
How often uuid1 is used?
I never use it and it looks uuid1 makes uuid.py complicated. How about split it to _uuid1.py (or uuid/init.py and uuid/_uuid1.py)?
I hope PEP-562 is accepted. It ease splitting out (heavy and slow and dirty) part into submodule without breaking backward compatibility.
Without PEP-562, easy way is making proxy function.
# uuid/init.py
def uuid1(node=None, clock_seq=None):
"""Generate a UUID from a host ID, sequence number, and the current time.
If 'node' is not given, getnode() is used to obtain the hardware
address. If 'clock_seq' is given, it is used as the sequence number;
otherwise a random 14-bit sequence number is chosen."""
from . import _uuid1
return _uuid1.uuid1()
Sorry, I reject my idea. It clearly overdone. uuid.py is not so huge.
It ease splitting out (heavy and slow and dirty) part into submodule without breaking backward compatibility.
Right. I created attached PR bpo-3795 to implement this idea.
I hope PEP-562 is accepted.
I don't think that this PEP is needed here. IMHO a new _uuid1 module is enoguh to avoid "side effects" on "import uuid".
Sorry, I reject my idea. It clearly overdone. uuid.py is not so huge.
Can you please elaborate? Do you think that my PR is wrong?
uuid code contains 4 bare "except:" blocks which should be replaced with appropriate exceptions, or at least "except Exception:" to not catch KeyboardInterrupt. But this should be modified in a second time.
I think https://github.com/python/cpython/pull/3796 is a better resolution. It creates an optional _uuid C extension to avoid ctypes if possible *and* also loads system functions lazily.
I think launching external tools like ifconfig and ipconfig can be avoided pretty easily. There are many recipes around the net how to use native API's.
That's right, but I would prefer to enhance uuid to use native API's in a different issue and focus on avoiding "side effects" on "import uuid" in this issue.
Do you know a native APIs for Windows and Linux? I don't. Do you have links to these "recipes"? If yes, please open a new issue.
Do you know a native APIs for Windows and Linux?
On Linux, we already use uuid_generate_time(). See _unixdll_get_node().
And on Windows, _windll_getnode() calls into _UuidCreate().
Antoine: "I think https://github.com/python/cpython/pull/3796 is a better resolution. It creates an optional _uuid C extension to avoid ctypes if possible *and* also loads system functions lazily."
Implementing bpo-20519 is a very good idea. But I still like the idea of putting all these ugly functions to get the node and geneate a UUID1 object in a different module, since most of these code is not used on most platforms.
Antoine: Would you mind to modify your PR 3796 to only implement bpo-20519? I would prefer to reorganize uuid.py in a second step. It will be easy to review and easy to discuss what is the best option.
It's nice to see things moving in this 6 years old issue :-)
> Sorry, I reject my idea. It clearly overdone. uuid.py is not so huge.
Can you please elaborate? Do you think that my PR is wrong?
I looked https://github.com/python/cpython/pull/3684 and I wonder if I should recommend to split module before review. So I meant "I stop requesting split module and I'll review the PR3684."
Now I see your PR and it looks more clean.
I would prefer to reorganize uuid.py in a second step
I am not reorganizing uuid.py, just making initialization lazy.
Oh. I was told that the PR 3684 of bpo-5885 in another fix for this issue.
I marked bpo-5885 as a duplicate of this issue, even if it's not exactly the same. The main purpose of this issue was to use uuid_generate_time() using a C extension, as bpo-20519.
+static PyObject +uuid_uuid1(PyObject *self, PyObject *args) +{ + uuid_t out; + uuid_generate_time(out); + return PyByteArray_FromStringAndSize((const char \) out,sizeof(out)); +}
+static PyObject +uuid_uuid4(PyObject *self, PyObject \args) +{ + uuid_t out; + uuid_generate_random(out); + return PyByteArray_FromStringAndSize(out,sizeof(out)); +}
I marked bpo-20519 as a duplicate of this issue, even if it's not exactly the same. The main purpose of this issue was to use uuid_generate_time() using a C extension:
+static PyObject +_uuid_generate_random(void) +{ + uuid_t out; + uuid_generate_random(out); + return PyBytes_FromStringAndSize((const char \) out, sizeof(out)); +}
+static PyObject +_uuid_generate_time(void) +{ + uuid_t out; + uuid_generate_time(out); + return PyBytes_FromStringAndSize((const char \) out, sizeof(out)); +}
I abandonned my PR and started to review Antoine's PR 3796 which basically combines all previous patches proposed last 6 years :-)
New changeset a106aec2ed6ba171838ca7e6ba43c4e722bbecd1 by Antoine Pitrou in branch 'master': bpo-11063, bpo-20519: avoid ctypes and improve import time for uuid (bpo-3796) https://github.com/python/cpython/commit/a106aec2ed6ba171838ca7e6ba43c4e722bbecd1
I ran two benchmarks on my Fedora 26.
git co master ./python -m perf timeit -s 'import sys, uuid' "del sys.modules['uuid']; import uuid; uuid = None" --inherit=PYTHONPATH -v -o import_new.json ./python -m perf timeit -s 'import uuid; u=uuid.uuid1' "u()" --inherit=PYTHONPATH -v -o uuid1_new.json
git co 8d59aca4a953b097a9b02b0ecafef840e4ac5855 ./python -m perf timeit -s 'import uuid; u=uuid.uuid1' "u()" --inherit=PYTHONPATH -v -o uuid1_ref.json ./python -m perf timeit -s 'import sys, uuid' "del sys.modules['uuid']; import uuid; uuid = None" --inherit=PYTHONPATH -v -o import_ref.json
Import:
haypo@selma$ ./python -m perf compare_to import_ref.json import_new.json --table +-----------+------------+-----------------------------+ | Benchmark | import_ref | import_new | +===========+============+=============================+ | timeit | 4.04 ms | 430 us: 9.39x faster (-89%) | +-----------+------------+-----------------------------+
uuid.uuid1():
haypo@selma$ ./python -m perf compare_to uuid1_ref.json uuid1_new.json --table +-----------+-----------+------------------------------+ | Benchmark | uuid1_ref | uuid1_new | +===========+===========+==============================+ | timeit | 18.9 us | 15.2 us: 1.24x faster (-20%) | +-----------+-----------+------------------------------+
Everything is faster. The import time is 9.4x faster, nice!
In practice, the import time is probably even better. My benchmark uses repeated import, it doesn't measure the "first time" import which was more expensive because of the "import ctypes".
Crude import benchmark (Ubuntu):
$ time ./python -c "import uuid"
real 0m0.074s user 0m0.056s sys 0m0.012s
$ time ./python -c "import uuid"
real 0m0.030s user 0m0.028s sys 0m0.000s
$ time ./python -c pass
real 0m0.027s user 0m0.024s sys 0m0.000s
uuid fails to build for me on master since that change landed:
cpython/Modules/_uuidmodule.c:13:11: error: implicit declaration of function 'uuid_generate_time_safe' is invalid in C99 [-Werror,-Wimplicit-function-declaration] res = uuid_generate_time_safe(out); ^ cpython/Modules/_uuidmodule.c:13:11: note: did you mean 'py_uuid_generate_time_safe'? cpython/Modules/_uuidmodule.c:8:1: note: 'py_uuid_generate_time_safe' declared here py_uuid_generate_time_safe(void) ^ cpython/Modules/_uuidmodule.c:13:11: warning: this function declaration is not a prototype [-Wstrict-prototypes] res = uuid_generate_time_safe(out); ^ 1 warning and 1 error generated.
This is on macOS Sierra.
It's expected if uuid_generate_time_safe() isn't available on your platform. But test_uuid still passes?
It's expected if uuid_generate_time_safe() isn't available on your platform. But test_uuid still passes?
I would prefer to avoid compilation errors on a popular platforms like macOS. Would it possible to check if uuid_generate_time_safe() is available, maybe in configure? Or we can "simply" skip _uuid compilation on macOS?
Would it possible to check if uuid_generate_time_safe() is available, maybe in configure?
That's probably possible.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['easy', '3.7', 'library', 'performance']
title = 'Rework uuid module: lazy initialization and add a new C extension'
updated_at =
user = 'https://bugs.python.org/KeithDart'
```
bugs.python.org fields:
```python
activity =
actor = 'ncoghlan'
assignee = 'none'
closed = True
closed_date =
closer = 'berker.peksag'
components = ['Library (Lib)']
creation =
creator = 'Keith.Dart'
dependencies = []
files = ['20685', '29220']
hgrepos = []
issue_num = 11063
keywords = ['patch', 'easy']
message_count = 67.0
messages = ['127442', '127989', '127998', '128001', '128008', '128010', '128012', '128013', '128015', '128016', '128017', '128018', '128019', '128020', '128021', '178293', '178297', '182778', '182873', '182875', '264151', '264998', '265109', '265122', '265126', '265449', '303202', '303205', '303208', '303209', '303210', '303211', '303212', '303213', '303214', '303215', '303218', '303220', '303228', '303230', '303231', '303234', '303238', '303281', '303284', '303288', '303365', '303406', '303525', '303531', '303532', '303533', '303534', '303539', '303540', '303552', '303555', '305587', '305589', '305635', '305897', '305900', '305904', '305909', '305911', '306984', '306985']
nosy_count = 23.0
nosy_names = ['ncoghlan', 'orsenthil', 'kdart', 'pitrou', 'vstinner', 'christian.heimes', 'ned.deily', 'eric.araujo', 'Arfrever', 'r.david.murray', 'methane', 'skrah', 'nvetoshkin', 'lukasz.langa', 'knny-myer', 'nailor', 'Keith.Dart', 'berker.peksag', 'hynek', 'martin.panter', 'serhiy.storchaka', 'Michael.Felt', 'aixtools@gmail.com']
pr_nums = ['3795', '3796', '3684', '3855', '4287', '4343', '4565']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue11063'
versions = ['Python 3.7']
```