mapillary / inplace_abn

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
BSD 3-Clause "New" or "Revised" License
1.32k stars 187 forks source link

RuntimeError when running with CPU only #167

Open rasmith opened 4 years ago

rasmith commented 4 years ago

I am getting a runtime error when trying to run on a MacBook Pro with just the CPU. It ends up being something like this:

/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[000] v n l2:0.0318 l1:0.1383  etap:0:0:0.00 eta:1:12:36.55 4 0
Process Process-4:8 l1:0.1383  etap:0:0:0.00 eta:1:12:36.55 4 1 
Traceback (most recent call last):
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "./train_and_evaluate_tasks.py", line 32, in run_trainer
    t.train()
  File "/Users/randallsmith/projects/git/relighting/trainers/gan_trainer.py", line 318, in train
    self.task_cfg)
  File "/Users/randallsmith/projects/git/relighting/trainers/gan_trainer.py", line 99, in perform_gan_step
    loss_generator.backward()
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
    return self._forward_cls.backward(self, *args)
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/autograd/function.py", line 189, in wrapper
    outputs = fn(ctx, *args)
  File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/inplace_abn/functions.py", line 112, in backward
    y_act, dy_act, weight, bias, ctx.eps, ctx.activation, ctx.activation_param)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. (view at /Users/distiller/project/conda/conda-bld/pytorch_1579022061893/work/aten/src/ATen/native/TensorShape.cpp:1175)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x10d279267 in libc10.dylib)
frame #1: at::native::view(at::Tensor const&, c10::ArrayRef<long long>) + 827 (0x11862462b in libtorch.dylib)
frame #2: at::CPUType::(anonymous namespace)::view(at::Tensor const&, c10::ArrayRef<long long>) + 61 (0x1188457ed in libtorch.dylib)
frame #3: c10::detail::wrap_kernel_functor_unboxed_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long long>), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long long> > >, at::Tensor (at::Tensor const&, c10::ArrayRef<long long>)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long long>) + 24 (0x11883cef8 in libtorch.dylib)
frame #4: at::Tensor c10::KernelFunction::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(at::Tensor const&, c10::ArrayRef<long long>) const + 63 (0x1182a607f in libtorch.dylib)
frame #5: std::__1::result_of<at::Tensor (ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>::type c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > >::read<at::Tensor c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>(at::Tensor&&) const + 175 (0x1182a5fcf in libtorch.dylib)
frame #6: std::__1::result_of<at::Tensor (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<at::Tensor c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(c10::DispatchTable const&)>(at::Tensor&&) const + 115 (0x1182a5eb3 in libtorch.dylib)
frame #7: at::Tensor::view(c10::ArrayRef<long long>) const + 341 (0x1182a2ad5 in libtorch.dylib)
frame #8: torch::autograd::VariableType::(anonymous namespace)::view(at::Tensor const&, c10::ArrayRef<long long>) + 1566 (0x11ad0fdae in libtorch.dylib)
frame #9: c10::detail::wrap_kernel_functor_unboxed_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long long>), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long long> > >, at::Tensor (at::Tensor const&, c10::ArrayRef<long long>)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long long>) + 24 (0x11883cef8 in libtorch.dylib)
frame #10: at::Tensor c10::KernelFunction::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(at::Tensor const&, c10::ArrayRef<long long>) const + 63 (0x11f28decf in _backend.cpython-37m-darwin.so)
frame #11: std::__1::result_of<at::Tensor (ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>::type c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > >::read<at::Tensor c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>(at::Tensor&&) const + 168 (0x11f28de18 in _backend.cpython-37m-darwin.so)
frame #12: std::__1::result_of<at::Tensor (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<at::Tensor c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(c10::DispatchTable const&)>(at::Tensor&&) const + 118 (0x11f28dd06 in _backend.cpython-37m-darwin.so)
frame #13: at::Tensor::view(c10::ArrayRef<long long>) const + 97 (0x11f28dac1 in _backend.cpython-37m-darwin.so)
frame #14: normalize_shape(at::Tensor const&) + 145 (0x11f28da31 in _backend.cpython-37m-darwin.so)
frame #15: std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> backward_reduce_impl<float, (Activation)0>(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, float) + 576 (0x11f28a1e0 in _backend.cpython-37m-darwin.so)
frame #16: backward_reduce_cpu(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float) + 194 (0x11f275d62 in _backend.cpython-37m-darwin.so)
frame #17: backward_reduce(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float) + 783 (0x11f28f73f in _backend.cpython-37m-darwin.so)
frame #18: void pybind11::cpp_function::initialize<std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*&)(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float), std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float, pybind11::name, pybind11::scope, pybind11::sibling, char [32]>(std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*&)(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float), std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*)(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [32])::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const + 109 (0x11f2ad70d in _backend.cpython-37m-darwin.so)
frame #19: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3088 (0x11f29d960 in _backend.cpython-37m-darwin.so)
<omitting python frames>
frame #32: torch::autograd::PyNode::apply(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> >&&) + 578 (0x1176fb5f2 in libtorch_python.dylib)
frame #33: torch::autograd::Node::operator()(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> >&&) + 464 (0x11aed48e0 in libtorch.dylib)
frame #34: torch::autograd::Engine::evaluate_function(std::__1::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 1381 (0x11aecc495 in libtorch.dylib)
frame #35: torch::autograd::Engine::thread_main(std::__1::shared_ptr<torch::autograd::GraphTask> const&, bool) + 532 (0x11aecb364 in libtorch.dylib)
frame #36: torch::autograd::Engine::thread_init(int) + 152 (0x11aecb118 in libtorch.dylib)
frame #37: torch::autograd::python::PythonEngine::thread_init(int) + 44 (0x1176f5cec in libtorch_python.dylib)
frame #38: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (torch::autograd::Engine::*)(int), torch::autograd::Engine*, int> >(void*) + 66 (0x11aed8a72 in libtorch.dylib)
frame #39: _pthread_start + 148 (0x7fff67ec7e65 in libsystem_pthread.dylib)
frame #40: thread_start + 15 (0x7fff67ec383b in libsystem_pthread.dylib)

I removed any calls to view and used contiguous to see if I could force the tensors to be contiguous, but it seems it gets to backward_reduce() and crashes there. Is there any issue with running this on a CPU only machine? I'm doing this because I want to do some trial runs on my local machine before trying it out on the remote machine.

ducksoup commented 4 years ago

I tried reproducing the issue on my machine with no success (just run a forward / backward sequence on CPU). Can you please come up with a minimal reproducing example for me to debug?

PS: we don't have access to MacOS machines, so if this is something MacOS specific I'm afraid I won't be able to help you.