zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 452 forks source link

activation_kernel.cu(21):error:__host__ or __device__ annotation on lambda requires --expt-extended-lambda nvcc flag #251

Closed MSC19950601 closed 4 years ago

MSC19950601 commented 4 years ago

activation_kernel.cu(21):error:host or device annotation on lambda requires --expt-extended-lambda nvcc flag

what is the problem?

Originally posted by @zhuizhunew in https://github.com/zhanghang1989/PyTorch-Encoding/issues/66#issuecomment-520147082

same issue!

zhanghang1989 commented 4 years ago

I am making a new version. which is in PR https://github.com/zhanghang1989/PyTorch-Encoding/pull/256 with new setup instructions.

This new PR will be merged soon. Let me know if you still have the issue.

MSC19950601 commented 4 years ago

I'm sorry, I still met some bugs. My env is Ubuntu 18.04, torch 1.4.0, CUDA 10.1. I install torch-encoding by github source, the whole install progress is fine. But when I install lib/gpu (enclib_gpu) manually, I meet some bugs. Here is the log.

running install running bdist_egg running egg_info writing enclib_gpu.egg-info/PKG-INFO writing dependency_links to enclib_gpu.egg-info/dependency_links.txt writing top-level names to enclib_gpu.egg-info/top_level.txt reading manifest file 'enclib_gpu.egg-info/SOURCES.txt' writing manifest file 'enclib_gpu.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'enclib_gpu' extension gcc -pthread -B /home/kururu/anaconda3/envs/kururudev-torchdev/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/TH -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/include/python3.6m -c operator.cpp -o build/temp.linux-x86_64-3.6/operator.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=enclib_gpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ /usr/local/cuda/bin/nvcc -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/TH -I/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/kururu/anaconda3/envs/kururudev-torchdev/include/python3.6m -c activation_kernel.cu -o build/temp.linux-x86_64-3.6/activation_kernel.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=enclib_gpu -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11 /home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(14): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(15): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(15): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(15): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(18): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(19): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(19): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(19): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(23): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(24): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(24): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/functional/padding.h(24): warning: integer conversion resulted in a change of sign

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/autograd/profiler.h(97): warning: attribute "visibility" does not apply here

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/autograd/profiler.h(112): warning: attribute "visibility" does not apply here

/home/kururu/anaconda3/envs/kururudev-torchdev/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/enum.h(179): warning: statement is unreachable

activation_kernel.cu(20): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

activation_kernel.cu(21): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

activation_kernel.cu(23): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

activation_kernel.cu(24): error: host or device annotation on lambda requires --expt-extended-lambda nvcc flag

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->double", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->__nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const double &)->nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const double &)->double, lambda [](const double &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const double &)->double, Predicate=lambda [](const double &)->nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=double]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:20) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->nv_bool", defined at activation_kernel.cu:21) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->__nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::device_ptr, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(100): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(214): here instantiation of "ForwardIterator thrust::transform_if(InputIterator1, InputIterator1, InputIterator2, ForwardIterator, UnaryFunction, Predicate) [with InputIterator1=thrust::device_ptr, InputIterator2=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]" activation_kernel.cu(21): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->float", defined at activation_kernel.cu:23) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::__transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->__nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

/usr/local/cuda/include/thrust/system/cuda/detail/core/agent_launcher.h(926): error: The closure type for a lambda ("lambda [](const float &)->nv_bool", defined at activation_kernel.cu:24) cannot be used in the template argument type of a global function template instantiation, unless the lambda is defined within a device or global function, or the lambda is an 'extended lambda' and the flag --expt-extended-lambda is specified detected during: instantiation of "thrust::cuda_cub::core::_kernel_agent" based on template arguments <thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t> (926): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch_impl(thrust::detail::true_type, _0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" (1077): here instantiation of "void thrust::cuda_cub::core::AgentLauncher::launch(_0, _1) const [with Agent=thrust::cuda_cub::parallel_for::ParallelForAgent<thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, std::ptrdiff_t>, _0=thrust::cuda_cub::__transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, _1=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(142): here instantiation of "cudaError_t thrust::cuda_cub::parallel_for::parallel_for(Size, F, cudaStream_t) [with F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h(164): here instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::transform::unary_transform_f<thrust::device_ptr, thrust::device_ptr, thrust::cuda_cub::transform::no_stencil_tag, lambda [](const float &)->float, lambda [](const float &)->nv_bool>, Size=std::ptrdiff_t]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(232): here instantiation of "OutputIt thrust::cuda_cub::transform::unary(Policy &, InputIt, OutputIt, Size, StencilIt, TransformOp, Predicate) [with Policy=thrust::cuda_cub::execution_policy, InputIt=thrust::device_ptr, Size=std::ptrdiff_t, OutputIt=thrust::device_ptr, StencilIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(309): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, StencilInputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, StencilInputIt=thrust::cuda_cub::transform::no_stencil_tag, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/system/cuda/detail/transform.h(331): here instantiation of "OutputIt thrust::cuda_cub::transform_if(thrust::cuda_cub::execution_policy &, InputIt, InputIt, OutputIt, TransformOp, Predicate) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::device_ptr, OutputIt=thrust::device_ptr, TransformOp=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(80): here instantiation of "ForwardIterator thrust::transform_if(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" /usr/local/cuda/include/thrust/detail/transform.inl(188): here instantiation of "ForwardIterator thrust::transform_if(InputIterator, InputIterator, ForwardIterator, UnaryFunction, Predicate) [with InputIterator=thrust::device_ptr, ForwardIterator=thrust::device_ptr, UnaryFunction=lambda [](const float &)->float, Predicate=lambda [](const float &)->nv_bool]" activation_kernel.cu(24): here instantiation of "void ::leaky_relu_backward_impl(T , T , float, int64_t) [with T=float]" activation_kernel.cu(36): here

20 errors detected in the compilation of "/tmp/tmpxft_00006a79_00000000-6_activation_kernel.cpp1.ii". error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

MSC19950601 commented 4 years ago

Also when I only use encoding.nn.SyncBatchNorm for model test, my process of python script hangs on, fill GPU memory bu no computing use.

MSC19950601 commented 4 years ago

BTW, I only install ninja 1.8.2 in my python env but not install in my system, is it matter?

zhanghang1989 commented 4 years ago

I am not expert in system setup. I haven't tried ubuntu 18.04 or cuda 10.1.

My setting is ubuntu 16.04 and cuda 10.0 with pytorch 1.4.0. With the same setup, you may follow the setup steps here: https://hangzhang.org/PyTorch-Encoding/notes/compile.html

MSC19950601 commented 4 years ago

Thank you for your patient explanation. However, when I follow your installation, the situation didn't change. When I only use encoding.nn.SyncBatchNorm for model test, my process of python script hangs on, fill GPU memory bu no computing use.

zhanghang1989 commented 4 years ago

That's wired. https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/syncbn.py#L175-L176

The eval mode should use the standard BN forward.

zhanghang1989 commented 4 years ago

Are you using the most recent version of the code?

zhanghang1989 commented 4 years ago

Could you try

pip install torch-encoding --pre

which installs the most recent version

MSC19950601 commented 4 years ago

Thanks for your patient explanation. What I used is the most recent version.

zhanghang1989 commented 4 years ago

Is your issue related to PyCharm like this https://github.com/zhanghang1989/PyTorch-Encoding/issues/260