plasma-umass / scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Apache License 2.0
12.14k stars 398 forks source link

AttributeError: 'userbase' with Pytorch #650

Open aishwaryyasarkar opened 1 year ago

aishwaryyasarkar commented 1 year ago

Describe the bug I'm not sure if this is a bug but I can't seem to resolve this. Here's the error I am seeing:

 Error in program being profiled:
 'userbase'
Traceback (most recent call last):
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1857, in profile_code
    exec(code, the_globals, the_locals)
  File "/global/u1/s/sark777/Distributed_DGL/src/test/scalene_test.py", line 1, in <module>
    import torch
  File "<frozen importlib._bootstrap>", line 1173, in _find_and_load
  File "<frozen importlib._bootstrap>", line 173, in __exit__
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 641, in free_signal_handler
    Scalene.enter_function_meta(this_frame, Scalene.__stats)
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1283, in enter_function_meta
    if not Scalene.should_trace(f.f_code.co_filename, f.f_code.co_name):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1632, in should_trace
    libdir = str(pathlib.Path(sysconfig.get_path(p, n)).resolve())
                              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/sysconfig.py", line 626, in get_path
    return get_paths(scheme, vars, expand)[name]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/sysconfig.py", line 616, in get_paths
    return _expand_vars(scheme, vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/sysconfig.py", line 275, in _expand_vars
    res[key] = os.path.normpath(_subst_vars(value, vars))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/s/sark777/.conda/envs/dgl-nightly/lib/python3.11/sysconfig.py", line 251, in _subst_vars
    raise AttributeError(f'{var}') from None
AttributeError: 'userbase'

To Reproduce

from scalene import scalene_profiler
def torchtest():
    # Turn profiling on
    scalene_profiler.start()
....
    # Turn profiling off
    scalene_profiler.stop()

torchtest()

Run with the following command: python -m scalene --cli --html --outfile scalene_test.html scalene_test.py

It's generating the .html file but with the above error. I expected to see something like in the demo. Here's what I see when I open the .html file.

Screenshots Screenshot 2023-08-07 at 5 27 54 PM

Desktop (please complete the following information):

I already tried with the repo version as well as with conda.

Additional context It works fine with the following command. But stops working when I try to pass --memory.

Works: python -m scalene --cli --html --outfile scalene_test_cpu.html --cpu scalene_test.py

Doesn't work: python -m scalene --cli --html --outfile scalene_test_cpu.html --cpu --memory scalene_test.py

Not sure if it's relevant, but I also tried to generate a .json file. I was not able to upload it. I get the following error. Screenshot 2023-08-07 at 3 53 04 PM

sternj commented 1 year ago

We're having issues with Pytorch all around, its threadpool isn't cooperating well with our memory allocator (I think, at least). I'm still in the process of hunting that down, but I'll find that issue, link it with this one, and see whether the fix for that fixes this too.

tlangfor commented 8 months ago

I'm getting the same AttributeError: {'userbase'} when trying to profile a very simple cupy example:

import cupy as cp
import numpy as np

# Create 2D numpy arrays
a = np.random.random(100000000)
a = a.reshape(10000,10000)

b = np.random.random(100000000)
b = b.reshape(10000,10000)

# Move to GPU
g = cp.asarray(a)
h = cp.asarray(b)

# Matrix Mult
out = cp.matmul(g,h)

The full output is:

$ scalene --cli gpu_test.py
Error in program being profiled:
 {'userbase'}
Traceback (most recent call last):
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1709, in profile_code
    exec(code, the_globals, the_locals)
  File "/vast/palmer/home.grace/tl397/ycrc/workshops/parallel_python/gpu/gpu_test.py", line 1, in <module>
    import cupy as cp
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/cupy/__init__.py", line 4, in <module>
    import numpy as _numpy
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/numpy/__init__.py", line 141, in <module>
    from . import core
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/numpy/core/__init__.py", line 77, in <module>
    from . import defchararray as char
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/numpy/core/defchararray.py", line 262, in <module>
    def str_len(a):
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/numpy/core/overrides.py", line 178, in decorator
    def public_api(*args, **kwargs):
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/functools.py", line 33, in update_wrapper
    def update_wrapper(wrapper,
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 757, in cpu_signal_handler
    Scalene.compute_frames_to_record(),
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1134, in compute_frames_to_record
    while not Scalene.should_trace(fname, func):
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1502, in should_trace
    pathlib.Path(sysconfig.get_path(p, n)).resolve()
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 521, in get_path
    return get_paths(scheme, vars, expand)[name]
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 511, in get_paths
    return _expand_vars(scheme, vars)
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 172, in _expand_vars
    _extend_dict(vars, get_config_vars())
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 559, in get_config_vars
    _init_posix(_CONFIG_VARS)
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 430, in _init_posix
    _temp = __import__(name, globals(), locals(), ['build_time_vars'], 0)
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 779, in exec_module
  File "<frozen importlib._bootstrap_external>", line 874, in get_code
  File "<frozen importlib._bootstrap_external>", line 972, in get_data
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 506, in malloc_signal_handler
    Scalene.enter_function_meta(this_frame, Scalene.__stats)
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1169, in enter_function_meta
    if not Scalene.should_trace(f.f_code.co_filename, f.f_code.co_name):
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1502, in should_trace
    pathlib.Path(sysconfig.get_path(p, n)).resolve()
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 521, in get_path
    return get_paths(scheme, vars, expand)[name]
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 511, in get_paths
    return _expand_vars(scheme, vars)
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 177, in _expand_vars
    res[key] = os.path.normpath(_subst_vars(value, vars))
  File "/home/tl397/.conda/envs/parallel/lib/python3.8/sysconfig.py", line 158, in _subst_vars
    raise AttributeError('{%s}' % var) from None
AttributeError: {'userbase'}
Scalene: Program did not run for long enough to profile.
NOTE: The GPU is currently running in a mode that can reduce Scalene's accuracy when reporting GPU utilization.
Run once as Administrator or root (i.e., prefixed with `sudo`) to enable per-process GPU accounting.

Any idea what may be going on?

Edit:

I built a clean environment (conda create --name scalene python scalene cupy) and it worked once before failing with the same error as before. This is the full environment:

name: scalene
channels:
  - nvidia
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - bzip2=1.0.8=hd590300_5
  - ca-certificates=2024.2.2=hbcca054_0
  - cloudpickle=3.0.0=pyhd8ed1ab_0
  - cuda-nvrtc=12.3.107=0
  - cuda-version=12.3=h32bc705_2
  - cupy=13.0.0=py312h5add188_3
  - cupy-core=13.0.0=py312h7d03b9e_3
  - cython=3.0.8=py312h30efb56_0
  - fastrlock=0.8.2=py312h30efb56_2
  - jinja2=3.1.3=pyhd8ed1ab_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - libblas=3.9.0=21_linux64_openblas
  - libcblas=3.9.0=21_linux64_openblas
  - libcublas=12.3.4.1=0
  - libcufft=11.0.12.1=0
  - libcurand=10.3.4.107=0
  - libcusolver=11.5.4.101=0
  - libcusparse=12.2.0.103=0
  - libexpat=2.5.0=hcb278e6_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=13.2.0=h807b86a_5
  - libgfortran-ng=13.2.0=h69a702a_5
  - libgfortran5=13.2.0=ha4646dd_5
  - libgomp=13.2.0=h807b86a_5
  - liblapack=3.9.0=21_linux64_openblas
  - libnsl=2.0.1=hd590300_0
  - libopenblas=0.3.26=pthreads_h413a1c8_0
  - libsqlite=3.45.1=h2797004_0
  - libstdcxx-ng=13.2.0=h7e041cc_5
  - libuuid=2.38.1=h0b41bf4_0
  - libxcrypt=4.4.36=hd590300_1
  - libzlib=1.2.13=hd590300_5
  - markdown-it-py=3.0.0=pyhd8ed1ab_0
  - markupsafe=2.1.5=py312h98912ed_0
  - mdurl=0.1.2=pyhd8ed1ab_0
  - ncurses=6.4=h59595ed_2
  - numpy=1.26.4=py312heda63a1_0
  - openssl=3.2.1=hd590300_0
  - pip=24.0=pyhd8ed1ab_0
  - psutil=5.9.8=py312h98912ed_0
  - pygments=2.17.2=pyhd8ed1ab_0
  - pynvml=11.4.1=pyhd8ed1ab_0
  - python=3.12.1=hab00c5b_1_cpython
  - python_abi=3.12=4_cp312
  - readline=8.2=h8228510_1
  - rich=13.7.0=pyhd8ed1ab_0
  - scalene=1.5.34=py312h30efb56_0
  - setuptools=69.0.3=pyhd8ed1ab_0
  - tk=8.6.13=noxft_h4845f30_101
  - typing_extensions=4.9.0=pyha770c72_0
  - tzdata=2024a=h0c530f3_0
  - wheel=0.42.0=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
emeryberger commented 8 months ago

@tlangfor I believe this has now been fixed. Please try installing from the git repo (python3 -m pip install git+https://github.com/plasma-umass/scalene). I have verified that your code works with this version.

tlangfor commented 7 months ago

@emeryberger Thanks for the fix, I've confirmed that it's working for me with the test code. I will give it some more testing and let you know if anything comes up.

I'm planning on teaching a workshop on high-performance python and I want to use scalene as the go-to profiler, so I'm also looking forward to this getting pushed out to conda/pypi.

Thanks!

emeryberger commented 7 months ago

The latest version on pip and conda includes this fix. I am about to release a new version (1.5.37) that fixes a show-stopper UI bug in Scalene on Mac for Python 3.10 and earlier, so please wait to upgrade.