PZ_PMMH and cuda compilation

BlackEdder commented 6 years ago

Disclaimer: This might well be a libbi error, but because I tried it first with the PZ_PMMH example I submitted it here.

I am trying to recreate the PZ_MMH demo but with cuda enabled as follows:

library(rbi)
demo(PZ_PMMH)
bi_object <- sample(bi_object, obs=synthetic_dataset, init=init_parameters,
                     end_time=T, noutputs=T, nsamples=128, nparticles=128, options=list("cuda"=TRUE),
                     nthreads=1, log_file_name=tempfile(pattern="pmmhoutput", fileext=".txt"))

but run into the following error (from the make.log file)

/usr/local/cuda/include/thrust/system/cuda/detail/extrema.h(395): error: no suitable user-defined conversion from "std::tuple<thrust::permutation_iterator<thrust::device_ptr<const real>, thrust::transform_iterator<bi::strided_functor<std::ptrdiff_t>, thrust::counting_iterator<std::ptrdiff_t, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>>, thrust::cuda_cub::counting_iterator_t<signed long>>" to "iterator_tuple" exists
          detected during:
            instantiation of "ItemsIt thrust::cuda_cub::__extrema::element<ArgFunctor,Derived,ItemsIt,BinaryPred>(thrust::cuda_cub::execution_policy<Derived> &, ItemsIt, ItemsIt, BinaryPred) [with ArgFunctor=thrust::cuda_cub::__extrema::arg_max_f, Derived=thrust::cuda_cub::tag, ItemsIt=thrust::permutation_iterator<thrust::device_ptr<const real>, thrust::transform_iterator<bi::strided_functor<std::ptrdiff_t>, thrust::counting_iterator<std::ptrdiff_t, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>>, BinaryPred=bi::nan_less_functor<real>]" 
(475): here
            instantiation of "ItemsIt thrust::cuda_cub::max_element(thrust::cuda_cub::execution_policy<Derived> &, ItemsIt, ItemsIt, BinaryPred) [with Derived=thrust::cuda_cub::tag, ItemsIt=thrust::permutation_iterator<thrust::device_ptr<const real>, thrust::transform_iterator<bi::strided_functor<std::ptrdiff_t>, thrust::counting_iterator<std::ptrdiff_t, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>>, BinaryPred=bi::nan_less_functor<real>]" 
/usr/local/cuda/include/thrust/detail/extrema.inl(65): here
            instantiation of "ForwardIterator thrust::max_element(const thrust::detail::execution_policy_base<DerivedPolicy> &, ForwardIterator, ForwardIterator, BinaryPredicate) [with DerivedPolicy=thrust::cuda_cub::tag, ForwardIterator=thrust::permutation_iterator<thrust::device_ptr<const real>, thrust::transform_iterator<bi::strided_functor<std::ptrdiff_t>, thrust::counting_iterator<std::ptrdiff_t, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>>, BinaryPredicate=bi::nan_less_functor<real>]" 
/usr/local/cuda/include/thrust/detail/extrema.inl(139): here
            instantiation of "ForwardIterator thrust::max_element(ForwardIterator, ForwardIterator, BinaryPredicate) [with ForwardIterator=thrust::permutation_iterator<thrust::device_ptr<const real>, thrust::transform_iterator<bi::strided_functor<std::ptrdiff_t>, thrust::counting_iterator<std::ptrdiff_t, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>>, BinaryPredicate=bi::nan_less_functor<real>]" 
src/bi/state/../primitive/vector_primitive.hpp(872): here
            instantiation of "V1::value_type bi::max_reduce(V1) [with V1=bi::gpu_vector_reference<real, -1, -1>]" 
src/bi/state/../primitive/vector_primitive.hpp(968): here
            instantiation of "V1::value_type bi::ess_reduce(V1, double *) [with V1=bi::gpu_vector_reference<real, -1, -1>]" 
src/bi/filter/../resampler/Resampler.hpp(160): here
            instantiation of "double bi::Resampler<R>::reduce(V1, double *) [with R=bi::SystematicResampler, V1=bi::gpu_vector_reference<real, -1, -1>]" 
src/bi/filter/BootstrapPF.hpp(180): here
            instantiation of "void bi::BootstrapPF<B, F, O, R>::correct(bi::Random &, bi::ScheduleElement, S1 &) [with B=ModelPZ_model3cca73e8a634, F=bi::Forcer<bi::InputNullBuffer, bi::ON_DEVICE>, O=bi::Observer<bi::InputNetCDFBuffer, bi::ON_DEVICE>, R=bi::Resampler<bi::SystematicResampler>, S1=bi::BootstrapPFState<ModelPZ_model3cca73e8a634, bi::ON_DEVICE>]" 
src/bi/filter/Filter.hpp(65): here
            instantiation of "void bi::Filter<F>::filter(bi::Random &, bi::ScheduleIterator, bi::ScheduleIterator, S1 &, IO1 &) [with F=bi::BootstrapPF<ModelPZ_model3cca73e8a634, bi::Forcer<bi::InputNullBuffer, bi::ON_DEVICE>, bi::Observer<bi::InputNetCDFBuffer, bi::ON_DEVICE>, bi::Resampler<bi::SystematicResampler>>, S1=bi::BootstrapPFState<ModelPZ_model3cca73e8a634, bi::ON_DEVICE>, IO1=bi::ParticleFilterBuffer<bi::BootstrapPFCache<bi::ON_DEVICE, bi::ParticleFilterNullBuffer>>]" 
src/bi/sampler/MarginalMH.hpp(224): here
            instantiation of "void bi::MarginalMH<B, F>::init(bi::Random &, bi::ScheduleIterator, bi::ScheduleIterator, S1 &, IO1 &, IO2 &) [with B=ModelPZ_model3cca73e8a634, F=bi::Filter<bi::BootstrapPF<ModelPZ_model3cca73e8a634, bi::Forcer<bi::InputNullBuffer, bi::ON_DEVICE>, bi::Observer<bi::InputNetCDFBuffer, bi::ON_DEVICE>, bi::Resampler<bi::SystematicResampler>>>, S1=bi::BootstrapPFState<ModelPZ_model3cca73e8a634, bi::ON_DEVICE>, IO1=bi::ParticleFilterBuffer<bi::BootstrapPFCache<bi::ON_DEVICE, bi::ParticleFilterNullBuffer>>, IO2=bi::InputNetCDFBuffer]" 
src/bi/sampler/MarginalMH.hpp(206): here
            instantiation of "void bi::MarginalMH<B, F>::sample(bi::Random &, bi::ScheduleIterator, bi::ScheduleIterator, S1 &, int, IO1 &, IO2 &) [with B=ModelPZ_model3cca73e8a634, F=bi::Filter<bi::BootstrapPF<ModelPZ_model3cca73e8a634, bi::Forcer<bi::InputNullBuffer, bi::ON_DEVICE>, bi::Observer<bi::InputNetCDFBuffer, bi::ON_DEVICE>, bi::Resampler<bi::SystematicResampler>>>, S1=bi::MarginalMHState<ModelPZ_model3cca73e8a634, bi::ON_DEVICE, bi::BootstrapPFState<ModelPZ_model3cca73e8a634, bi::ON_DEVICE>, bi::ParticleFilterBuffer<bi::BootstrapPFCache<bi::ON_DEVICE, bi::ParticleFilterNullBuffer>>>, IO1=bi::MCMCBuffer<bi::MCMCCache<bi::ON_DEVICE, bi::MCMCNetCDFBuffer>>, IO2=bi::InputNetCDFBuffer]" 
src/sample_cpu.cpp(715): here

depbase=`echo src/bi/cuda/random/RandomKernel.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'` && \
srcbase=`echo src/bi/cuda/random/RandomKernel.o | sed 's|/[^/]*$||'` && \
perl nvcc_wrapper.pl /usr/local/cuda-10.0/bin/nvcc -ccbin=g++ -M -w -arch sm_30 -Xcompiler="-fopenmp -O3 -g3 -funroll-loops  " -Isrc  -I/usr/local/cuda/include  -DENABLE_CUDA  -DCUDA_FAST_MATH=0    -DENABLE_OPENMP    -DPACKAGE_NAME=\"LibBi\" -DPACKAGE_TARNAME=\"libbi\" -DPACKAGE_VERSION=\"1.4.2\" -DPACKAGE_STRING=\"LibBi\ 1.4.2\" -DPACKAGE_BUGREPORT=\"bug-report@libbi.org\" -DPACKAGE_URL=\"http://www.libbi.org\" -DHAVE_OMP_H=1 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_OMP -DHAVE_LIBM=1 -DHAVE_LIBGFORTRAN=1 -DHAVE_LIBATLAS=1 -DHAVE_LIBQRUPDATE=1 -DHAVE_LIBGSL=1 -DHAVE_LIBNETCDF=1 -DHAVE_LIBCUDA=1 -DHAVE_LIBCUDART=1 -DHAVE_LIBCURAND=1 -DHAVE_LIBCUBLAS=1 -DHAVE_NETCDF_H=1 -DHAVE_CBLAS_H=1 -DHAVE_GSL_GSL_CBLAS_H=1 -DHAVE_BOOST_MPL_IF_HPP=1 -DHAVE_BOOST_RANDOM_BINOMIAL_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_BERNOULLI_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_GAMMA_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_MERSENNE_TWISTER_HPP=1 -DHAVE_BOOST_RANDOM_NORMAL_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_POISSON_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_UNIFORM_INT_HPP=1 -DHAVE_BOOST_RANDOM_UNIFORM_REAL_HPP=1 -DHAVE_BOOST_RANDOM_VARIATE_GENERATOR_HPP=1 -DHAVE_BOOST_TYPEOF_TYPEOF_HPP=1 -DHAVE_THRUST_ADJACENT_DIFFERENCE_H=1 -DHAVE_THRUST_BINARY_SEARCH_H=1 -DHAVE_THRUST_COPY_H=1 -DHAVE_THRUST_DEVICE_PTR_H=1 -DHAVE_THRUST_DISTANCE_H=1 -DHAVE_THRUST_EXTREMA_H=1 -DHAVE_THRUST_FILL_H=1 -DHAVE_THRUST_FOR_EACH_H=1 -DHAVE_THRUST_FUNCTIONAL_H=1 -DHAVE_THRUST_GATHER_H=1 -DHAVE_THRUST_INNER_PRODUCT_H=1 -DHAVE_THRUST_ITERATOR_COUNTING_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_DETAIL_NORMAL_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_DISCARD_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_PERMUTATION_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_TRANSFORM_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_ZIP_ITERATOR_H=1 -DHAVE_THRUST_LOGICAL_H=1 -DHAVE_THRUST_REDUCE_H=1 -DHAVE_THRUST_SCAN_H=1 -DHAVE_THRUST_SEQUENCE_H=1 -DHAVE_THRUST_SORT_H=1 -DHAVE_THRUST_TRANSFORM_H=1 -DHAVE_THRUST_TRANSFORM_REDUCE_H=1 -DHAVE_THRUST_TRANSFORM_SCAN_H=1 -DHAVE_THRUST_TUPLE_H=1 -DHAVE_GSL_GSL_MULTIMIN_H=1 -DHAVE_CUBLAS_V2_H=1 -DHAVE_CURAND_H=1 -DENABLE_DIAGNOSTICS=no -DBOOST_NOINLINE -odir $srcbase -o $depbase.Tpo src/bi/cuda/random/RandomKernel.cu && \
perl nvcc_wrapper.pl /usr/local/cuda-10.0/bin/nvcc -ccbin=g++ -c -w -arch sm_30 -Xcompiler="-fopenmp -O3 -g3 -funroll-loops  " -Isrc  -I/usr/local/cuda/include  -DENABLE_CUDA  -DCUDA_FAST_MATH=0    -DENABLE_OPENMP    -DPACKAGE_NAME=\"LibBi\" -DPACKAGE_TARNAME=\"libbi\" -DPACKAGE_VERSION=\"1.4.2\" -DPACKAGE_STRING=\"LibBi\ 1.4.2\" -DPACKAGE_BUGREPORT=\"bug-report@libbi.org\" -DPACKAGE_URL=\"http://www.libbi.org\" -DHAVE_OMP_H=1 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_OMP -DHAVE_LIBM=1 -DHAVE_LIBGFORTRAN=1 -DHAVE_LIBATLAS=1 -DHAVE_LIBQRUPDATE=1 -DHAVE_LIBGSL=1 -DHAVE_LIBNETCDF=1 -DHAVE_LIBCUDA=1 -DHAVE_LIBCUDART=1 -DHAVE_LIBCURAND=1 -DHAVE_LIBCUBLAS=1 -DHAVE_NETCDF_H=1 -DHAVE_CBLAS_H=1 -DHAVE_GSL_GSL_CBLAS_H=1 -DHAVE_BOOST_MPL_IF_HPP=1 -DHAVE_BOOST_RANDOM_BINOMIAL_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_BERNOULLI_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_GAMMA_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_MERSENNE_TWISTER_HPP=1 -DHAVE_BOOST_RANDOM_NORMAL_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_POISSON_DISTRIBUTION_HPP=1 -DHAVE_BOOST_RANDOM_UNIFORM_INT_HPP=1 -DHAVE_BOOST_RANDOM_UNIFORM_REAL_HPP=1 -DHAVE_BOOST_RANDOM_VARIATE_GENERATOR_HPP=1 -DHAVE_BOOST_TYPEOF_TYPEOF_HPP=1 -DHAVE_THRUST_ADJACENT_DIFFERENCE_H=1 -DHAVE_THRUST_BINARY_SEARCH_H=1 -DHAVE_THRUST_COPY_H=1 -DHAVE_THRUST_DEVICE_PTR_H=1 -DHAVE_THRUST_DISTANCE_H=1 -DHAVE_THRUST_EXTREMA_H=1 -DHAVE_THRUST_FILL_H=1 -DHAVE_THRUST_FOR_EACH_H=1 -DHAVE_THRUST_FUNCTIONAL_H=1 -DHAVE_THRUST_GATHER_H=1 -DHAVE_THRUST_INNER_PRODUCT_H=1 -DHAVE_THRUST_ITERATOR_COUNTING_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_DETAIL_NORMAL_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_DISCARD_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_PERMUTATION_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_TRANSFORM_ITERATOR_H=1 -DHAVE_THRUST_ITERATOR_ZIP_ITERATOR_H=1 -DHAVE_THRUST_LOGICAL_H=1 -DHAVE_THRUST_REDUCE_H=1 -DHAVE_THRUST_SCAN_H=1 -DHAVE_THRUST_SEQUENCE_H=1 -DHAVE_THRUST_SORT_H=1 -DHAVE_THRUST_TRANSFORM_H=1 -DHAVE_THRUST_TRANSFORM_REDUCE_H=1 -DHAVE_THRUST_TRANSFORM_SCAN_H=1 -DHAVE_THRUST_TUPLE_H=1 -DHAVE_GSL_GSL_MULTIMIN_H=1 -DHAVE_CUBLAS_V2_H=1 -DHAVE_CURAND_H=1 -DENABLE_DIAGNOSTICS=no -DBOOST_NOINLINE -o src/bi/cuda/random/RandomKernel.o src/bi/cuda/random/RandomKernel.cu && \
cat $depbase.Tpo > $depbase.Po && \
rm -f $depbase.Tpo
1 error detected in the compilation of "/tmp/tmpxft_00007040_00000000-6_sample_gpu.cpp1.ii".
Makefile:1441: recipe for target 'src/sample_gpu.o' failed
make: *** [src/sample_gpu.o] Error 1
make: *** Waiting for unfinished jobs....

This is on ubuntu 18.04 with cuda 10

sbfnk commented 6 years ago

I haven't come across this before (not having tried CUDA 10). Can you see if you get the same issue with CUDA 9? You could the try the rbi-gpu docker image (which uses CUDA 9):

docker run -it sbfnk/rbi-gpu

BlackEdder commented 6 years ago

I tried the docker image and got the following error

Error: CUDA driver version is insufficient for CUDA runtime version
sample: src/bi/misc/omp.cpp:43: void bi_omp_init(int): Assertion `cudaErr == cudaSuccess' failed.
Aborted (core dumped)

Related to that I also found: https://github.com/NVIDIA/nvidia-docker/issues/700 Following that I tried calling it with --runtime=nvidia but then I get the following error:

docker run -it --runtime=nvidia sbfnk/rbi-gpu                                 ~ 
docker: Error response from daemon: Unknown runtime specified nvidia.

I'll try to downgrade cuda locally next.

BlackEdder commented 6 years ago

With cuda-9.2 I get the same error

BlackEdder commented 6 years ago

After some more digging, it looks like the nvidia gpu is too old for cuda >= 9.0, so the problem is on my end.

sbfnk commented 6 years ago

I'm re-opening this as I'm not convinced the original error is a driver/compatibility issue. The std::tuple in there seems misplaced. I'd like to try and reproduce this - what compiler/version were you using?

Also, long shot but could you go into the offending line (/usr/local/cuda/include/thrust/system/cuda/detail/extrema.h:395) and replace make_tuple with thrust::make_tuple, to see if you get the same error?

sbfnk / rbi

PZ_PMMH and cuda compilation #15