openhackathons-org / gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI
Apache License 2.0
513 stars 254 forks source link

[HPC] [NWays-challenge] stdpar implementation issue #95

Open mozhgan-kch opened 2 years ago

mozhgan-kch commented 2 years ago

Error reported for the stdpar implementation :

nvc++ -std=c++17 -stdpar=gpu -lm -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt -c jacobi.cpp
"/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/system/detail/generic/for_each.h", line 48: error: static assertion failed with "unimplemented for this system"
    THRUST_STATIC_ASSERT_MSG(
    ^
          detected during:
            instantiation of "InputIterator thrust::system::detail::generic::for_each(thrust::execution_policy<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 44 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/detail/for_each.inl"
            instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 1035 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
            instantiation of "void std::__pstl::__algorithm_wrapper_struct<true>::for_each(_FIt, _FIt, _UF) [with _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 2136 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
            instantiation of "std::__pstl::__enable_if_EP<_EP, void> std::for_each(_EP &&, _FIt, _FIt, _UF) [with _EP=const std::execution::parallel_policy &, _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 11 of "jacobi.cpp"

1 error detected in the compilation of "jacobi.cpp".
make: *** [Makefile:41: jacobi.o] Error 2

It looks like the issue is from the below part: This is the culprit:

void jacobistep(double *psinew, double *psi, int m, int n)
{

        std::for_each(std::execution::par, thrust::counting_iterator<unsigned int>(1u), 
                      thrust::counting_iterator<unsigned int>(m),
                      [psinew, psi, m, n](unsigned int i) {

      for(int j=1;j<=n;j++)
    {
      psinew[i*(m+2)+j]=0.25*(psi[(i-1)*(m+2)+j]+psi[(i+1)*(m+2)+j]+psi[i*(m+2)+j-1]+psi[i*(m+2)+j+1]);
        }
                      });

}

This needs investigation to recreate the issue.

mozhgan-kch commented 2 years ago

Looks like adding -cuda to the compiler solves this: https://forums.developer.nvidia.com/t/device-code-generated-from-stdpar-versus-thrust/196172/9

This needs checking to make sure the container is the same as the nways lab.

mozhgan-kch commented 2 years ago

Check and see if the deltasq calc is wrong. It might be missing various includes (unless we put them in thejacobi.h?)