Open nullbull opened 6 months ago
[11:51:13] Top 10 stacks with outstanding allocations:
1056 bytes in 1 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
1344 bytes in 7 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
1920 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
3520 bytes in 5 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
3520 bytes in 5 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
4560 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
5280 bytes in 55 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
5280 bytes in 5 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
19008 bytes in 99 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
23150 bytes in 125 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:19] Top 10 stacks with outstanding allocations:
1920 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
1920 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
3840 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
7040 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
7040 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
7392 bytes in 7 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
9120 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
10560 bytes in 110 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
37824 bytes in 197 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
46300 bytes in 250 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:24] Top 10 stacks with outstanding allocations:
2880 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
2880 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
5760 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
9504 bytes in 9 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
10560 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
10560 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
13680 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
15840 bytes in 165 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
56832 bytes in 296 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
69450 bytes in 375 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:29] Top 10 stacks with outstanding allocations:
3840 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
3840 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
7680 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
13728 bytes in 13 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
14080 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
14080 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
18240 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
21120 bytes in 220 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
75840 bytes in 395 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
92600 bytes in 500 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:34] Top 10 stacks with outstanding allocations:
4800 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
4800 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
9600 bytes in 50 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
15840 bytes in 15 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
17600 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
17600 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
22800 bytes in 50 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
26400 bytes in 275 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
94656 bytes in 493 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
115750 bytes in 625 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:39] Top 10 stacks with outstanding allocations:
5760 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
5776 bytes in 19 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
11520 bytes in 60 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
19008 bytes in 18 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
21120 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
21120 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
27360 bytes in 60 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
31680 bytes in 330 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
113664 bytes in 592 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
138900 bytes in 750 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:44] Top 10 stacks with outstanding allocations:
6720 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
6720 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
13440 bytes in 70 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
23232 bytes in 22 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
24640 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
24640 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
31920 bytes in 70 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
36960 bytes in 385 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
132672 bytes in 691 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
162050 bytes in 875 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:49] Top 10 stacks with outstanding allocations:
7680 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
7680 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
15360 bytes in 80 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
25344 bytes in 24 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
28160 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
28160 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
36480 bytes in 80 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
42240 bytes in 440 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
151680 bytes in 790 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
185200 bytes in 1000 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:54] Top 10 stacks with outstanding allocations:
8640 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
8640 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
17280 bytes in 90 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
28512 bytes in 27 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
31680 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
31680 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
41040 bytes in 90 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
47520 bytes in 495 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
170688 bytes in 889 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
208350 bytes in 1125 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:59] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
[11:52:04] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
[11:52:09] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
[11:52:14] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
@nullbull ,
Thanks for your reporting. It could be much appreciated if you could provide the model test_full_save.pt
as demonstrated in your example. Even better if you could pinpoint which function causes mem leak? Feel free to send PR if you could as I am in a very slow-response mode. Thank you!
Can you give me an email,I send this model to you, I try to find which function but it's hard for me, i am not familiar with c++
But I think any model may have memory leaks.
@nullbull ,
Please do a fork and add your model file to the release or make an entry in the examples
and share your fork.
Other way is sharing with Google drive, dropbox or any public file sharing would be great. Thanks.
@nullbull ,
I have quick test your example and putting forward pass inside ts.NoGrad()
to complete shutdown the grad accumulation (due to ts.Randn() op by default set grad to true), I also increase size of tensor to expose any modest leak and it seems to be fine. My box memory seem to be stable for at least 1M cycles.
package main
import (
"fmt"
"github.com/sugarme/gotch"
"github.com/sugarme/gotch/ts"
)
func main() {
TestModel()
}
func TestModel() {
N := 1_000_000_000
m, err := ts.ModuleLoad("test_full_save.pt")
if err != nil {
panic(err)
}
m.SetEval()
for i := 0; i < N; i++ {
// tf := ts.MustRand([]int64{1, 7}, gotch.Float, gotch.CPU)
tf := ts.MustRand([]int64{1024, 7}, gotch.Float, gotch.CPU)
ts.NoGrad(func() {
res, err := m.Forward(tf)
if err != nil {
panic(err)
}
res.MustDrop()
})
tf.MustDrop()
if i%1000 == 0 {
fmt.Printf("Done %d \n", i)
}
}
}
Please always handle error as well. Let's me know if that's fine in your box.
A note that when putting forward() in a for loop particularly for Go in CPU, we should see some spiky fluctuation of memory consuming.
@sugarme I use valgrind ,it still find memory leak,I re-wrote your code and found through stress testing that the memory is still growing, but the QPS has not increased.
@sugarme My service over 5000QPS/Per node,it's easy to reach 1M cycles, It‘s a 20C/32G node
@nullbull ,
I would try the following things:
tf
outside for-loop. Something like:tf := ts.MustRand([]int64{1024, 7}, gotch.Float, gotch.CPU)
for i := 0; i < N; i++ {
ts.NoGrad(func() {
res, err := m.Forward(tf)
if err != nil {
panic(err)
}
res.MustDrop()
})
// tf.MustDrop()
if i%1000 == 0 {
fmt.Printf("Done %d \n", i)
}
}
If no leak, then the problem is at tensor initiation ts.MustRand
.
For-Loop
could be the problem. Depend on how you compose your server, try real use case rather than for-loop
?@sugarme Sorry, on the way home just now, the service using gotch has been online. Now the cluster will be restarted regularly every day to ensure that there will be no OOM. The service code is not a for loop, it is calculated once per request. I used valgrind to run 100 loops and detected a memory leak of 18B. 1 I did a stress test for 2 days last week. The service memory increased to 95% of the memory and then OOM restarted. here is my online code,requset will send a [][]float64 array and I need change it to [][]float64 tensor
xPredict := tensors["x"].([][]float32)
for _, v := range xPredict {
modelInput = append(modelInput, v...)
}
tf := ts.MustOfSlice(modelInput).MustView([]int64{int64(len(xPredict)), int64(len(xPredict[0]))}, true)
forward, err := model.Forward(tf)
if err != nil {
log.V2.Error().Str("local inf fail").With(ctx).Error(err).Emit()
} else {
//toString, _ := forward.ToString(10)
log.V2.Info().With(ctx).Str("local inf success").Emit()
}
I did a stress test for an hour, and the memory went from 40.6% to 43.4%. After the request was completed, the memory dropped back to 5.6%. In theory, the memory should be 0.x% without requests, because my service is very simple and there is no local cache. Just do model inference
@nullbull,
I suspect it's related to closed issue #102 It would be great if you could try with the suggested solution:
func Rand(...) (...) {
var untypedPtr uintptr
ptr := (*lib.Ctensor)(unsafe.Pointer(&untypedPtr))
// Some C call that stores an allocated tensor at *ptr.
retVal = &Tensor{ctensor: *ptr}
return retVal, err
}
Thank you.
@sugarme I tried, but it does not work, you can use valgrind, it still memory leak,and I did stress test,memory still going up
@sugarme According to the above changes, there is still a memory leak. Please help me solve it.
We use gotch for our online services, but we find that the server's RSS, that is, the memory usage indicator, has been rising. I suspect it is a memory leak, and the program using pprof golang is only a few dozen M, but the RSS has been rising. Then I used many methods and finally found out through valgrind that there was indeed a cgo memory leak problem.
here is my test code
There is valgrind find memory leak informations, The following is the command executed.
valgrind --leak-check=full ./model_test