python / cpython

The Python programming language
https://www.python.org
Other
63.22k stars 30.28k forks source link

Resource tracker fails to track filenames with colons on Linux #98896

Open maaleske opened 1 year ago

maaleske commented 1 year ago

Bug report

multiprocessing.resource_tracker fails to track files with a colon (':') in the filename due to a parsing error, allowing resource leaks. I originally encountered this while working with filenames that include ISO8601 timestamps with colons as time separators.

Minimal example:

$ python -c "from multiprocessing.shared_memory import SharedMemory; shm = SharedMemory(create=True, size=1, name='a:b')"
$ Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/resource_tracker.py", line 199, in main
    cmd, name, rtype = line.strip().decode('ascii').split(':')
ValueError: too many values to unpack (expected 3)
$ ls /dev/shm/a\:b
/dev/shm/a:b

Without the colon, the file is correctly tracked and removed:

$ python -c "from multiprocessing.shared_memory import SharedMemory; shm = SharedMemory(create=True, size=1, name='ab')"
$ /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
$ ls /dev/shm/ab
ls: cannot access '/dev/shm/ab': No such file or directory

Your environment

saito828koki commented 1 year ago

@maaleske Thanks for your report. In my opinion, making colons unusable in resource names is a reasonable fix for this issue. (We should also check if other parts of multiprocessing using resource_tracker like multiprocessing.synchronize.SemLock will cause the same problem. )

Are there better ways to fix this issue?

maaleske commented 1 year ago

@saito828koki That is poor solution, since the issue is still present if the resource is not created by the Python process, but just opened through SharedMemory (as was the case in my original encounter with it). Disallowing colons would make SharedMemory unusable in situations where Python is not in charge of the filenames, which I imagine to be a common use of shared memory resources.

I also investigated a bit more and colons are not the only problem; if the filenames contain newlines (again, valid on Linux) and colons, you can trigger other errors from the same code path, and I think there may even be a way to induce the resource tracker to unlink the wrong files by injecting an extra \ncmd:name:rtype\n in to the file through the filename, though I didn't yet find a working example.

Given that Python might not be the one responsible for the filenames, the real fix is to not parse them and store them in a way that does not require it. If you want to be rigorous, I think you would have to store each filename in a separate file, and manage the related cmd/rtype in some way. However, there is also an issue related to incorrect behaviour of the resource tracker (#82300), with one fix proposing to remove it altogether for shared memory resources, which would make this a non-issue for SHM, but still probably relevant for semaphores.