The Pillow functions Image.open() and Image.save(), among others, use os.path.realpath() on the input path to transform it to a symlink-free canonical path.
There is no reliance on this path's "realness" afterwards that is obvious to me, and realpath() requires an lstat() system call on every single path segment from / on down to the file in question (or more, if symlinks are detected along the way). On clusters with distributed filesystems such as BeeGFS, this leads and is seen as a multiplication (by a factor greater than or equal to the depth of the image file within the hierarchy) of the number of metadata I/O operations, and causes DDoS of the metadata servers for those filesystems.
Pillow is heavily utilized in AI workloads with image datasets and data-loaders, a popular usecase, making Pillow's use of realpath() especially painful. PyTorch's ImageFolder is the classic example.
Please reconsider all uses of realpath() in Pillow. Preferentially remove it entirely. In the alternative that you absolutely need the last segment of the path to be translated to acquire a "true" file-extension, consider readlink().
The Pillow functions
Image.open()
andImage.save()
, among others, useos.path.realpath()
on the input path to transform it to a symlink-free canonical path.There is no reliance on this path's "realness" afterwards that is obvious to me, and
realpath()
requires anlstat()
system call on every single path segment from/
on down to the file in question (or more, if symlinks are detected along the way). On clusters with distributed filesystems such as BeeGFS, this leads and is seen as a multiplication (by a factor greater than or equal to the depth of the image file within the hierarchy) of the number of metadata I/O operations, and causes DDoS of the metadata servers for those filesystems.Pillow is heavily utilized in AI workloads with image datasets and data-loaders, a popular usecase, making Pillow's use of
realpath()
especially painful. PyTorch'sImageFolder
is the classic example.Please reconsider all uses of
realpath()
in Pillow. Preferentially remove it entirely. In the alternative that you absolutely need the last segment of the path to be translated to acquire a "true" file-extension, considerreadlink()
.