plasma-umass / scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Apache License 2.0
11.55k stars 387 forks source link

Scalene breaks code that accesses files on a s3 mountpoint #841

Closed ywilke closed 1 month ago

ywilke commented 1 month ago

Describe the bug I am trying to profile some code that accesses files on a mount of s3 using the offical mountpoint-s3 from aws. https://github.com/awslabs/mountpoint-s3 The code runs fine but when running the code with scalene it cannot see the files anymore.

To Reproduce

  1. Create an ec2 instance with the following AMI: Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240314
  2. Install mountpoint-s3 and create a mountpoint of a s3 bucket /usr/bin/mount-s3 --allow-overwrite --allow-delete $bucket --file-mode 664 --cache /tmp --metadata-ttl 3 --max-cache-size 102400 $mountpoint
  3. Install scalene version 1.5.43.1 from repo
  4. Run the script below to test access to files:
    
    from pathlib import Path
    import os

input_dir = Path("/mnt/mounted_bucket_dir")

glob_len = len(list(input_dir.glob("*"))) print("glob_len", glob_len)

os_len = len(os.listdir(str(input_dir))) print("os_len", os_len)

5. When running the script normally with python it should tell you how many files are in the directory.
6. When running the script with scalene you will get the following output and error:

ubuntu@ip-xx-xxx-xx-xx:/home/ubuntu$ scalene /home/ubuntu/path_test.py glob_len 0 Error in program being profiled: [Errno 22] Invalid argument: '/mnt/mounted_bucket_dir' Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/XXX/lib/python3.12/site-packages/scalene/scalene_profiler.py", line 1731, in profile_code exec(code, the_globals, the_locals) File "/home/ubuntu/path_test.py", line 9, in os_len = len(os.listdir(str(input_dir))) ^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [Errno 22] Invalid argument: '/mnt/mounted_bucket_dir' Scalene: The specified code did not run for long enough to profile. By default, Scalene only profiles code in the file executed and its subdirectories. To track the time spent in all files, use the --profile-all option.


Pathlib.glob can not see any files and the os.listdir returns with an error.

**Expected behavior**
Expected behavior is that the code returns the files in the mountpoint just like when you call the script normally with python.

**Additional context**
Switching the input dir to a local path on the ec2 works fine with python and scalene. The problems arises when running the code with scalene in combination with using files on the s3 mountpoint.
emeryberger commented 1 month ago

Successfully reproduced. Not clear what the issue is.

emeryberger commented 1 month ago

The same error happens when running with py-spy, so it's not a Scalene-specific issue. You can observe this using this command line:

% py-spy record -- python3.10 test-access.py

FWIW, it works with pyinstrument and austin, so it doesn't affect all profilers.

emeryberger commented 1 month ago

I figured out the cause of the issue: when the CPU timer signal interrupts the os.listdir call, it messes with FUSE / mount-s3. I'm not sure why it should do this, but the above PR (now merged) works around it by blocking signals (on Linux/Mac) for any function in os.