Open Musisoul opened 2 years ago
I also try to use MONeT on a higher version and run the example code:
import torch, torchvision
from monet.cvxpy_solver import Solution
from monet.monet_wrapper import MONeTWrapper
import time
input = torch.randn(184,3,224,224).cuda()
model = torchvision.models.resnet50()
input_shape = (3,224,224)
# Can change to use absolute path instead of relative
sol_file = "/data/dev/MONeT/data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_10.00.pkl"
# import pickle
# with open(sol_file, 'rb') as f:
# data = pickle.load(f)
# # print(data)
train_model = MONeTWrapper(model, sol_file, (3,224,224)).cuda()
output = train_model(input)
output.sum().backward()
print("Memory used: %6.2f MB" % (torch.cuda.max_memory_allocated()/1024/1024))
and I got the error message:
File "/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1881, in _run_ninja_build
subprocess.run(
File "/data/tmp/miniconda3/envs/tdy/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/dev/MONeT/monet/lm_ops/compress.py", line 6, in <module>
compress_cpp = load(name="compress_cpp", sources=[this_dir / "compress.cpp", this_dir / "compress.cu"], extra_cflags=['-std=c++17', '-lcusparse'], extra_cuda_cflags=['-lcusparse'],extra_ldflags=['-lcusparse'])
File "/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1265, in load
return _jit_compile(
File "/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1489, in _jit_compile
_write_ninja_file_and_build_library(
File "/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1604, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1897, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'compress_cpp': [1/2] c++ -MMD -MF compress.o.d -DTORCH_EXTENSION_NAME=compress_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/TH -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/tmp/miniconda3/envs/tdy/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -std=c++17 -lcusparse -c /data/dev/MONeT/monet/lm_ops/compress.cpp -o compress.o
FAILED: compress.o
c++ -MMD -MF compress.o.d -DTORCH_EXTENSION_NAME=compress_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/TH -isystem /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/tmp/miniconda3/envs/tdy/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -std=c++17 -lcusparse -c /data/dev/MONeT/monet/lm_ops/compress.cpp -o compress.o
/data/dev/MONeT/monet/lm_ops/compress.cpp: In function ‘std::tuple<at::Tensor, at::Tensor, at::Tensor> compress_csr_256(const at::Tensor&, const at::Tensor&, size_t)’:
/data/dev/MONeT/monet/lm_ops/compress.cpp:31:17: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
if (ip.type().is_cuda()) {
^
In file included from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
from /data/dev/MONeT/monet/lm_ops/compress.cpp:1:
/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:214:30: note: declared here
DeprecatedTypeProperties & type() const {
^
/data/dev/MONeT/monet/lm_ops/compress.cpp:40:29: error: converting to ‘std::tuple<at::Tensor, at::Tensor, at::Tensor>’ from initializer list would use explicit constructor ‘constexpr std::tuple< <template-parameter-1-1> >::tuple(_UElements&& ...) [with _UElements = {at::Tensor&, at::Tensor&, at::Tensor&}; <template-parameter-2-2> = void; _Elements = {at::Tensor, at::Tensor, at::Tensor}]’
return {cip, idx, rowidx};
^
/data/dev/MONeT/monet/lm_ops/compress.cpp: In function ‘at::Tensor uncompress_csr_256(const at::Tensor&, const at::Tensor&, const at::Tensor&, size_t)’:
/data/dev/MONeT/monet/lm_ops/compress.cpp:47:45: warning: narrowing conversion of ‘((N + 255ul) / 256ul)’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
torch::Tensor op = torch::zeros({(N+255)/256,256}, torch::dtype(torch::kFloat32).device(compip.device()));
^
/data/dev/MONeT/monet/lm_ops/compress.cpp:47:45: warning: narrowing conversion of ‘((N + 255ul) / 256ul)’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/data/dev/MONeT/monet/lm_ops/compress.cpp:50:21: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
if (compip.type().is_cuda()) {
^
In file included from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
from /data/dev/MONeT/monet/lm_ops/compress.cpp:1:
/data/tmp/miniconda3/envs/tdy/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:214:30: note: declared here
DeprecatedTypeProperties & type() const {
^
ninja: build stopped: subcommand failed.```
I use pytorch 1.13.0.dev20220801+cu113, g++(GCC) 5.4.0
- Can you post the detailed configurations, including PyTorch, CUDA, g++, etc. ?
Please see https://github.com/utsaslab/MONeT/blob/master/install.sh#L11
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=10.1 -c pytorch -y
Thanks for checking out MONeT.
at
and at::native
namespace in order to implement other functions like output-activated backward operations.If the aten operations are deprecated or signature-modified in later versions, that could also cause a problem. Identifying which function is failing will help to update it to newer versions of PyTorch.
Thanks for your work! Currently we have two questions:
load
function when running examples/training.py. We have also tried PyTorch 1.5.0 with CUDA 10.1, we didn't get previous error but gotcuDNN error: CUDNN_STATUS_EXECUTION_FAILED
at forward function in monet/lm_ops/bn.py, and the program(examples/training.py) took a long time on initialization. Can you post the detailed configurations, including PyTorch, CUDA, g++, etc. ?python cvxpy_solver.py MODEL ...
, and the model format should be"torchvision.models.<model>()"
. Can we use MONeT to generate solutions for our own models?