msr-fiddle / pipedream

MIT License
379 stars 117 forks source link

Error occurs in profiler after I updated pytorch #27

Closed kanonjz closed 4 years ago

kanonjz commented 4 years ago
Traceback (most recent call last):
  File "main.py", line 574, in <module>
    main()
  File "main.py", line 266, in main
    per_layer_times, data_time = profile_train(train_loader, model, criterion, optimizer)
  File "main.py", line 345, in profile_train
    with torchprofiler.Profiling(model, module_whitelist=[]) as p:
  File "../torchmodules/torchprofiler/profiling.py", line 25, in __enter__
    self.start()
  File "../torchmodules/torchprofiler/profiling.py", line 93, in start
    self.hook_modules(self.model)
  File "../torchmodules/torchprofiler/profiling.py", line 120, in hook_modules
    self.hook_modules(sub_module)
  File "../torchmodules/torchprofiler/profiling.py", line 120, in hook_modules
    self.hook_modules(sub_module)
  File "../torchmodules/torchprofiler/profiling.py", line 122, in hook_modules
    sub_module.reset_hooks()
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'Conv2d' object has no attribute 'reset_hooks'
kanonjz commented 4 years ago

Hi, @deepakn94 , I don't know why this error occurs after I updated PyTorch in Docker. Is it related to file pre_hook.patch?

deepakn94 commented 4 years ago

Yes, please apply the patch and then re-build.

kanonjz commented 4 years ago

How to apply the patch?

deepakn94 commented 4 years ago

You could try building the container directly using the Dockerfile provided in the repository: https://github.com/msr-fiddle/pipedream/blob/master/Dockerfile.

Or you could directly try running something like this (where pre_hook.patch is the patch in the repository, and pytorch is the directory containing the cloned PyTorch repository),

cd pytorch && patch -p1 < pre_hook.patch && \
    TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5+PTX" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    NCCL_INCLUDE_DIR="/usr/include/" \
    NCCL_LIB_DIR="/usr/lib/" \
    python setup.py install && python setup.py clean
kanonjz commented 4 years ago

OK, I'll try. I still don't quite understand what does the patch do

deepakn94 commented 4 years ago

It adds implementation for a pre-hook that's not available in current PyTorch -- we need this to time operators in the profiling step.

kanonjz commented 4 years ago

Thanks!! @deepakn94

deepakn94 commented 4 years ago

Going to close this.

jiashenC commented 4 years ago

I just want to try out the profiler.

Can I run the profiler without docker?

I am trying to compile this patch with PyTorch but getting some errors. Which base version of PyTorch is this patch for?