Rework uuid module: lazy initialization and add a new C extension

dfcc25ac-e242-4495-8f49-176193f06a74 commented 13 years ago

BPO	11063
Nosy	@ncoghlan, @orsenthil, @pitrou, @vstinner, @tiran, @ned-deily, @merwok, @bitdancer, @methane, @skrah, @ambv, @berkerpeksag, @hynek, @vadmium, @serhiy-storchaka, @aixtools
PRs	python/cpython#3795 python/cpython#3796 python/cpython#3684 python/cpython#3855 python/cpython#4287 python/cpython#4343 python/cpython#4565
Files	issue11063.patch: Fix issue python/cpython#55272 issue11063_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['easy', '3.7', 'library', 'performance'] title = 'Rework uuid module: lazy initialization and add a new C extension' updated_at = user = 'https://bugs.python.org/KeithDart' ``` bugs.python.org fields: ```python activity = actor = 'ncoghlan' assignee = 'none' closed = True closed_date = closer = 'berker.peksag' components = ['Library (Lib)'] creation = creator = 'Keith.Dart' dependencies = [] files = ['20685', '29220'] hgrepos = [] issue_num = 11063 keywords = ['patch', 'easy'] message_count = 67.0 messages = ['127442', '127989', '127998', '128001', '128008', '128010', '128012', '128013', '128015', '128016', '128017', '128018', '128019', '128020', '128021', '178293', '178297', '182778', '182873', '182875', '264151', '264998', '265109', '265122', '265126', '265449', '303202', '303205', '303208', '303209', '303210', '303211', '303212', '303213', '303214', '303215', '303218', '303220', '303228', '303230', '303231', '303234', '303238', '303281', '303284', '303288', '303365', '303406', '303525', '303531', '303532', '303533', '303534', '303539', '303540', '303552', '303555', '305587', '305589', '305635', '305897', '305900', '305904', '305909', '305911', '306984', '306985'] nosy_count = 23.0 nosy_names = ['ncoghlan', 'orsenthil', 'kdart', 'pitrou', 'vstinner', 'christian.heimes', 'ned.deily', 'eric.araujo', 'Arfrever', 'r.david.murray', 'methane', 'skrah', 'nvetoshkin', 'lukasz.langa', 'knny-myer', 'nailor', 'Keith.Dart', 'berker.peksag', 'hynek', 'martin.panter', 'serhiy.storchaka', 'Michael.Felt', 'aixtools@gmail.com'] pr_nums = ['3795', '3796', '3684', '3855', '4287', '4343', '4565'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue11063' versions = ['Python 3.7'] ```

dfcc25ac-e242-4495-8f49-176193f06a74 commented 13 years ago

When the uuid.py module is simply imported it has the side effect of forking a subprocess (/sbin/ldconfig) and doing a lot of stuff find a uuid implementation by ctypes. This is undesirable in many contexts. It would be better to perform those tasks on demand, when the first UUID is actually requested. In general, imports should avoid unnecessary system call side effects. This also makes testing easier.

13da56d0-ffd6-4df2-acc1-58ae1cd87921 commented 13 years ago

I think launching external tools like ifconfig and ipconfig can be avoided pretty easily. There are many recipes around the net how to use native API's. About ctypes' horrible logic during find_library call - don't know yet.

32810b5e-93e8-40e6-9880-9c5964c3c937 commented 13 years ago

With the attached patch the "heavy work" will be done on request, when calling uuid1() or uuid4() not on import.

I am working off from the py3k svn branch. Is it necessary to submit a separate patch for py2 branch?

orsenthil commented 13 years ago

Kenny, I don't see a problem with uuid is *imported*, it just creates a couple of STANDARD UUID class objects for use later. And this seems to just set the number and validates it. I don't see any subprocess calls performed. Perhaps you were referring to scenarios of using uuid1/uuid5 methods in mac and suggesting improvements to it by your patch?

bitdancer commented 13 years ago

If you do 'python -c "import uuid" under strace, _posixsubprocess is definitely loaded, and a pipe2 call is made.

Take a look at the code starting at (py3k trunk) line 418 (try:). That's where the weird stuff happens, which is what the patch is addressing.

Ken: thanks for working on this.

pitrou commented 13 years ago

Thanks for posting a patch! I have two comments:

Have you run test_uuid? When I run it, it seems to go into an infinite loop somewhere and I need to kill the process.
uuid should work even when ctypes is not available, so you can't just put an import statement at the top-level without a fallback

13da56d0-ffd6-4df2-acc1-58ae1cd87921 commented 13 years ago

uuid should work even when ctypes is not available A bit of offtopic: why can't we assume that ctypes is available?

pitrou commented 13 years ago

A bit of offtopic: why can't we assume that ctypes is available?

Because ctypes (or, actually, the libffi it relies on) needs specific low-level code for each platform it runs on, and not all platforms have such code.

Another reason is that ctypes is dangerous and some administrators might prefer to disable it (especially on shared hosting ala Google App Engine).

32810b5e-93e8-40e6-9880-9c5964c3c937 commented 13 years ago

Thanks for pointing that out! I guess that is the reason you did the import in a try block.

13da56d0-ffd6-4df2-acc1-58ae1cd87921 commented 13 years ago

Maybe I understood and ctypes ImportError simply must be handled and fallbacked to something else. But there are only 3 ways of getting MAC address:

using popen
using ctypes and native calls
using C API and performing native calls in extension

And ctypes seems to be the best choice: it's portable across Python VMs (better that 3) and faster (better than 1).

pitrou commented 13 years ago

Maybe I understood and ctypes ImportError simply must be handled and fallbacked to something else.

Indeed.

But there are only 3 ways of getting MAC address:

using popen

using ctypes and native calls

using C API and performing native calls in extension

And ctypes seems to be the best choice: it's portable across Python VMs (better that 3) and faster (better than 1).

Perhaps, but doing without ctypes should still be possible, otherwise it's a regression.

4b5d7c10-21e6-498b-929f-b5b74f201a21 commented 13 years ago

It's also possible using existing wrapped os system calls. One exaple is here: http://code.google.com/p/pycopia/source/browse/trunk/aid/pycopia/ifconfig.py

Although that one doesn't current support MAC addresses, but it could. The socket module also now support the netlink socket on Linux, so it shouldbe possible to use that for getting MAC address on Linux.

13da56d0-ffd6-4df2-acc1-58ae1cd87921 commented 13 years ago

It's also possible using existing wrapped os system calls. That's right, on linux we can use ioctls but windows would require win api calls like this one: http://stackoverflow.com/questions/166506/finding-local-ip-addresses-in-python/166992#166992

4b5d7c10-21e6-498b-929f-b5b74f201a21 commented 13 years ago

I'm thinking Python could use a general purpose ifconfig/mac-layer module that uuid.py could then just use.

pitrou commented 13 years ago

I'm thinking Python could use a general purpose ifconfig/mac-layer module that uuid.py could then just use.

Perhaps, but that's really out of scope for this issue. Feel free to open another issue.

hynek commented 11 years ago

The patch hasn’t incorporated Antoine’s comments AFAICT.

Also I don’t see this fit for back porting to bug fix releases. Correct me if I’m wrong.

tiran commented 11 years ago

Hynek, you are right.

df009fb1-c942-43d2-8a7e-62c4f2da0ada commented 11 years ago

The implementation does not actually end up in infinite loop, just repeating the loading of the CDLL is slow.

I added caching to that and fixed the ctypes imports too.

hynek commented 11 years ago

Jyrki, roundup doesn’t seem to recognize you patch so we can’t review it in Rietveld. Could you re-try, maybe using hg?

df009fb1-c942-43d2-8a7e-62c4f2da0ada commented 11 years ago

Here's a second take on the patch

vadmium commented 8 years ago

One thing I am wondering about is why we have to use find_library() at all. Wouldn’t it be more robust, and more efficient, to call CDLL() directly? We just have to know the exactly library version we are expecting. On Linux, the full soname is libuuid.so.1. It seems on OS X it is called libc.dylib (but it would be good for someone else to confirm).

# The uuid_generate_* routines are provided by libuuid on at least
# Linux and FreeBSD, and provided by libc on Mac OS X.
if sys.platform == "darwin":
    libname = "libc.dylib"
else:
    libname = "libuuid.so.1"
_ctypes_lib = ctypes.CDLL(libname)

aixtools commented 8 years ago

I cannot comment on uuid directly, but for me, this is yet another example of how assumptions can break things.

imho - if you know the exact version of s shared library that you want, calling cdll directly should be find. Maybe find_library is historic.

However, an advantage to calling find_library is that it should lead to an OSError if it cannot be loaded. But trapping OSError on a call to ctypes.cdll or testing for None (NULL) from find_library() is the option of a developer using the ctypes interface.

As far as libFOO.so.N - do you always want to assume it is going to be version N, or are you expecting to be able to work work version N+X? Again, find_library() can give you the library name it would load - but a programmer must be aware of the platform differences (e.g., AIX returns not a filename, but a libFOO.a(libFOO.so.N) - as just one example. p.s. as far as ldconfig being part of the problem on AIX (as it does not exist and can lead to OSError or just long delays, I have worked out a (pending) patch for ctypes/util (and ctypes/cdll) that may address the issues with uuid import on AIX. (see bpo-26439 for the patch)

vadmium commented 8 years ago

The versioning problem with libFOO.so.N already occurs with compiled programs. A C program compiled against libuuid.so.1 will fail to load if you only have libuuid.so.2. On the other hand, a Python program using find_library() will find either version. My point about robustness is that if a version 2 is invented, it might have different semantics or function signatures, and Python would then be assuming the wrong semantics.

tiran commented 8 years ago

It sounds like ctypes is causing you some headache. How about we get rid of ctypes for uuid and replace it with a simple implementation in C instead? Autoconf (configure) can take care of the library detection easily.

vadmium commented 8 years ago

There is already bpo-20519 for that, although it looks like the proposed patch keeps ctypes as a fall-back. (My interest here is only theoretical.)

349513ba-0efb-4c7b-884a-6c4d85313e4a commented 8 years ago

The way I have seen that resolved - in many locations, is to have the option to specify a specific version, e.g., libFOO.so.1 and then as long as that version remains available, perhaps as read-only for running with existing programs,they continue to work even when libFOO.so.2. However, for some who believes, or expects, to be able to deal with potential ABI changes - they can specify simply libFOO.so and let the loader decide (what I see is that libFOO.so is a symbolic link to, or a copy of the latest version.)

So, I agree wholeheartedly, if a versioned number is requested, that, or nothing, should be returned. However, if it is generic - try to find a generic named library (and get the link or copy), and when that is not available either, take the latest versioned number.

It has been over two months, and I may have read it wrong - but that appears to be what the current "ldconfig -p" solution implements. (in ctypes, not uuid, so perhaps this is not the correct place to be responding. If so, my apologies).

On 08-May-16 05:24, Martin Panter wrote:

Martin Panter added the comment:

The versioning problem with libFOO.so.N already occurs with compiled programs. A C program compiled against libuuid.so.1 will fail to load if you only have libuuid.so.2. On the other hand, a Python program using find_library() will find either version. My point about robustness is that if a version 2 is invented, it might have different semantics or function signatures, and Python would then be assuming the wrong semantics.

----------

Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue11063\

methane commented 7 years ago

How often uuid1 is used?

I never use it and it looks uuid1 makes uuid.py complicated. How about split it to _uuid1.py (or uuid/init.py and uuid/_uuid1.py)?

I hope PEP-562 is accepted. It ease splitting out (heavy and slow and dirty) part into submodule without breaking backward compatibility.

Without PEP-562, easy way is making proxy function.

# uuid/init.py

def uuid1(node=None, clock_seq=None):
    """Generate a UUID from a host ID, sequence number, and the current time.
    If 'node' is not given, getnode() is used to obtain the hardware
    address.  If 'clock_seq' is given, it is used as the sequence number;
    otherwise a random 14-bit sequence number is chosen."""
    from . import _uuid1 
    return _uuid1.uuid1()

methane commented 7 years ago

Sorry, I reject my idea. It clearly overdone. uuid.py is not so huge.

vstinner commented 7 years ago

It ease splitting out (heavy and slow and dirty) part into submodule without breaking backward compatibility.

Right. I created attached PR bpo-3795 to implement this idea.

I hope PEP-562 is accepted.

I don't think that this PEP is needed here. IMHO a new _uuid1 module is enoguh to avoid "side effects" on "import uuid".

Sorry, I reject my idea. It clearly overdone. uuid.py is not so huge.

Can you please elaborate? Do you think that my PR is wrong?

vstinner commented 7 years ago

uuid code contains 4 bare "except:" blocks which should be replaced with appropriate exceptions, or at least "except Exception:" to not catch KeyboardInterrupt. But this should be modified in a second time.

pitrou commented 7 years ago

I think https://github.com/python/cpython/pull/3796 is a better resolution. It creates an optional _uuid C extension to avoid ctypes if possible *and* also loads system functions lazily.

vstinner commented 7 years ago

I think launching external tools like ifconfig and ipconfig can be avoided pretty easily. There are many recipes around the net how to use native API's.

That's right, but I would prefer to enhance uuid to use native API's in a different issue and focus on avoiding "side effects" on "import uuid" in this issue.

Do you know a native APIs for Windows and Linux? I don't. Do you have links to these "recipes"? If yes, please open a new issue.

pitrou commented 7 years ago

Do you know a native APIs for Windows and Linux?

On Linux, we already use uuid_generate_time(). See _unixdll_get_node().

pitrou commented 7 years ago

And on Windows, _windll_getnode() calls into _UuidCreate().

vstinner commented 7 years ago

Antoine: "I think https://github.com/python/cpython/pull/3796 is a better resolution. It creates an optional _uuid C extension to avoid ctypes if possible *and* also loads system functions lazily."

Implementing bpo-20519 is a very good idea. But I still like the idea of putting all these ugly functions to get the node and geneate a UUID1 object in a different module, since most of these code is not used on most platforms.

Antoine: Would you mind to modify your PR 3796 to only implement bpo-20519? I would prefer to reorganize uuid.py in a second step. It will be easy to review and easy to discuss what is the best option.

vstinner commented 7 years ago

It's nice to see things moving in this 6 years old issue :-)

methane commented 7 years ago

> Sorry, I reject my idea. It clearly overdone. uuid.py is not so huge.

Can you please elaborate? Do you think that my PR is wrong?

I looked https://github.com/python/cpython/pull/3684 and I wonder if I should recommend to split module before review. So I meant "I stop requesting split module and I'll review the PR3684."

Now I see your PR and it looks more clean.

pitrou commented 7 years ago

I would prefer to reorganize uuid.py in a second step

I am not reorganizing uuid.py, just making initialization lazy.

vstinner commented 7 years ago

Oh. I was told that the PR 3684 of bpo-5885 in another fix for this issue.

pitrou commented 7 years ago

PR 3684 seems mostly a subset of what I'm proposing.

vstinner commented 7 years ago

I marked bpo-5885 as a duplicate of this issue, even if it's not exactly the same. The main purpose of this issue was to use uuid_generate_time() using a C extension, as bpo-20519.

+static PyObject +uuid_uuid1(PyObject *self, PyObject *args) +{ + uuid_t out; + uuid_generate_time(out); + return PyByteArray_FromStringAndSize((const char \) out,sizeof(out)); +}

+static PyObject +uuid_uuid4(PyObject *self, PyObject \args) +{ + uuid_t out; + uuid_generate_random(out); + return PyByteArray_FromStringAndSize(out,sizeof(out)); +}

vstinner commented 7 years ago

I marked bpo-20519 as a duplicate of this issue, even if it's not exactly the same. The main purpose of this issue was to use uuid_generate_time() using a C extension:

+static PyObject +_uuid_generate_random(void) +{ + uuid_t out; + uuid_generate_random(out); + return PyBytes_FromStringAndSize((const char \) out, sizeof(out)); +}

+static PyObject +_uuid_generate_time(void) +{ + uuid_t out; + uuid_generate_time(out); + return PyBytes_FromStringAndSize((const char \) out, sizeof(out)); +}

vstinner commented 7 years ago

I abandonned my PR and started to review Antoine's PR 3796 which basically combines all previous patches proposed last 6 years :-)

pitrou commented 7 years ago

New changeset a106aec2ed6ba171838ca7e6ba43c4e722bbecd1 by Antoine Pitrou in branch 'master': bpo-11063, bpo-20519: avoid ctypes and improve import time for uuid (bpo-3796) https://github.com/python/cpython/commit/a106aec2ed6ba171838ca7e6ba43c4e722bbecd1

vstinner commented 7 years ago

I ran two benchmarks on my Fedora 26.

new = master (commit a106aec2ed6ba171838ca7e6ba43c4e722bbecd1)
ref = commit 8d59aca4a953b097a9b02b0ecafef840e4ac5855

git co master ./python -m perf timeit -s 'import sys, uuid' "del sys.modules['uuid']; import uuid; uuid = None" --inherit=PYTHONPATH -v -o import_new.json ./python -m perf timeit -s 'import uuid; u=uuid.uuid1' "u()" --inherit=PYTHONPATH -v -o uuid1_new.json

git co 8d59aca4a953b097a9b02b0ecafef840e4ac5855 ./python -m perf timeit -s 'import uuid; u=uuid.uuid1' "u()" --inherit=PYTHONPATH -v -o uuid1_ref.json ./python -m perf timeit -s 'import sys, uuid' "del sys.modules['uuid']; import uuid; uuid = None" --inherit=PYTHONPATH -v -o import_ref.json

Import:

uuid.uuid1():

Everything is faster. The import time is 9.4x faster, nice!

In practice, the import time is probably even better. My benchmark uses repeated import, it doesn't measure the "first time" import which was more expensive because of the "import ctypes".

pitrou commented 7 years ago

Crude import benchmark (Ubuntu):

before:

$ time ./python -c "import uuid"

real 0m0.074s user 0m0.056s sys 0m0.012s

after:

$ time ./python -c "import uuid"

real 0m0.030s user 0m0.028s sys 0m0.000s

baseline:

$ time ./python -c pass

real 0m0.027s user 0m0.024s sys 0m0.000s

ambv commented 7 years ago

uuid fails to build for me on master since that change landed:

cpython/Modules/_uuidmodule.c:13:11: error: implicit declaration of function 'uuid_generate_time_safe' is invalid in C99 [-Werror,-Wimplicit-function-declaration] res = uuid_generate_time_safe(out); ^ cpython/Modules/_uuidmodule.c:13:11: note: did you mean 'py_uuid_generate_time_safe'? cpython/Modules/_uuidmodule.c:8:1: note: 'py_uuid_generate_time_safe' declared here py_uuid_generate_time_safe(void) ^ cpython/Modules/_uuidmodule.c:13:11: warning: this function declaration is not a prototype [-Wstrict-prototypes] res = uuid_generate_time_safe(out); ^ 1 warning and 1 error generated.

This is on macOS Sierra.

pitrou commented 7 years ago

It's expected if uuid_generate_time_safe() isn't available on your platform. But test_uuid still passes?

vstinner commented 7 years ago

It's expected if uuid_generate_time_safe() isn't available on your platform. But test_uuid still passes?

I would prefer to avoid compilation errors on a popular platforms like macOS. Would it possible to check if uuid_generate_time_safe() is available, maybe in configure? Or we can "simply" skip _uuid compilation on macOS?

pitrou commented 7 years ago

Would it possible to check if uuid_generate_time_safe() is available, maybe in configure?

That's probably possible.

python / cpython

Rework uuid module: lazy initialization and add a new C extension #55272