xiaotaiye / cusp-library

Automatically exported from code.google.com/p/cusp-library
Apache License 2.0
0 stars 0 forks source link

Failure with sacusppoly preconditioner with petsc #79

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Build petsc-dev with cuda support using cusp/thrust
2. Build and run src/ksp/ksp/examples/tutorials/ex2.
3. See the attached log file.

What is the expected output? What do you see instead?

Successful run to completion.  Failure with invalid device pointer.

What version of the product are you using? On what operating system?

cusp-0.2.0, thrust-1.4.0, cuda-4.0 release on Centos 5.6

Please provide any additional information below.

Original issue reported on code.google.com by dnystr...@comcast.net on 24 Oct 2011 at 6:21

Attachments:

GoogleCodeExporter commented 8 years ago
What command did you use to build petsc?

Original comment by filipe.c...@gmail.com on 24 Oct 2011 at 7:04

GoogleCodeExporter commented 8 years ago
cusp-library@googlecode.com writes:
 > 
 > Comment #1 on issue 79 by filipe.c...@gmail.com: Failure with sacusppoly  
 > preconditioner with petsc
 > http://code.google.com/p/cusp-library/issues/detail?id=79
 > 
 > What command did you use to build petsc?
 > 
 > -- 
 > You received this message because you starred the issue.
 > You may adjust your notification preferences at:
 > https://code.google.com/hosting/settings
 > 
 > Reply to this email to add a comment.

See the attached file for the build script that I used.  I have cusp-0.2.0
and thrust-1.4.0 installed in /usr/local/cuda/include.  Also, note that I
appear able to successfully use the sacusp and ainvcusp preconditioners in
petsc.

Original comment by dnystr...@comcast.net on 24 Oct 2011 at 2:01

GoogleCodeExporter commented 8 years ago
I think the attached file didn't come through.

Original comment by filipe.c...@gmail.com on 24 Oct 2011 at 3:35

GoogleCodeExporter commented 8 years ago
Try this.

Original comment by dnystr...@comcast.net on 24 Oct 2011 at 4:44

Attachments:

GoogleCodeExporter commented 8 years ago
Did this answer your questions?  I think the problem is fairly reproducible.  
I've reproduced it with at least a couple of snapshots of petsc-dev.  I also 
get the same result on a personal petsc application.

Original comment by dnystr...@comcast.net on 24 Oct 2011 at 7:13

GoogleCodeExporter commented 8 years ago
You did. I'm still compiling petsc.

Original comment by filipe.c...@gmail.com on 24 Oct 2011 at 7:21

GoogleCodeExporter commented 8 years ago
I manage to reproduce it, but it goes away when i compile petsc in debug mode 
(--with-debugging=yes).
Can you confirm this?

Original comment by filipe.c...@gmail.com on 24 Oct 2011 at 8:47

GoogleCodeExporter commented 8 years ago
I have not tried compiling petsc in debug mode because I have been more focused 
on testing various solvers and preconditioners on my particular application.  
Did you want me to try and reproduce the debug behavior?  I could do that this 
evening when I go home.

Original comment by dnystr...@comcast.net on 24 Oct 2011 at 10:01

GoogleCodeExporter commented 8 years ago
Exactly.

Original comment by filipe.c...@gmail.com on 24 Oct 2011 at 10:26

GoogleCodeExporter commented 8 years ago
Exactly what?  Do you want me to try and reproduce the debug behavior?

Original comment by dnystr...@comcast.net on 24 Oct 2011 at 11:36

GoogleCodeExporter commented 8 years ago
Yes.

Original comment by filipe.c...@gmail.com on 24 Oct 2011 at 11:40

GoogleCodeExporter commented 8 years ago
OK.

Original comment by dnystr...@comcast.net on 25 Oct 2011 at 12:23

GoogleCodeExporter commented 8 years ago
I have now built the latest petsc-dev as of today, 12/3/11, with both debugging 
turned on and separately with optimized code.  When I build and run 
src/ksp/ksp/examples/tutorials/ex2, the problem runs fine when I'm using the 
version of petsc-dev with debugging turned on.  When I use the optimized 
version of petsc-dev, I receive the following error message at the end of the 
run:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  invalid device pointer
Command terminated by signal 6

I'm running the petsc ex2 example with the following command line:

ex2 -m 1600 -n 1600 -ksp_type cg -pc_type sacusppoly -mat_type aijcusp 
-vec_type cusp -log_summary

If instead I run the following command:

ex2 -m 1600 -n 1600 -ksp_type cg -pc_type sacusp -mat_type aijcusp -vec_type 
cusp -log_summary

then the example runs fine with the optimized version.  So the problem appears 
to be only with sacusppoly.  I'll also add that I ran the above test case with 
-pc_type ainvcusp as well and that runs fine to completion.  So this problem 
seems isolated to the case of -pc_type sacusppoly.

I hope this gives you enough information to figure out what the problem is.

Thanks,

Dave

Original comment by dnystr...@comcast.net on 3 Dec 2011 at 10:44

GoogleCodeExporter commented 8 years ago
So how is this going?  Any progress?

Original comment by dnystr...@comcast.net on 6 Dec 2011 at 11:55

GoogleCodeExporter commented 8 years ago
This thread appears to be dead.

Original comment by dnystr...@comcast.net on 12 Dec 2011 at 9:28

GoogleCodeExporter commented 8 years ago
Have you tried to contact the developers of petsc? This might well be a problem 
with petsc and without knowing the details of petsc it's hard to figure out.

Original comment by filipe.c...@gmail.com on 12 Dec 2011 at 10:06

GoogleCodeExporter commented 8 years ago
I actually contacted the Petsc team first on this issue.  They indicated that 
they were just using cusp and that I should contact the cusp team for problems 
related to cusp.  So the problem could be with the Petsc integration of cusp in 
which case it would be their problem to fix.  Or the problem could be in cusp 
in which case it would be a problem for the cusp team to fix.  But I don't know 
which of these cases corresponds to this problem and so far, neither team seems 
able or willing to investigate enough to figure out where the problem lies.  I 
guess I will ping the Petsc team again and see if someone is willing to 
investigate.

Original comment by dnystr...@comcast.net on 13 Dec 2011 at 4:41

GoogleCodeExporter commented 8 years ago
So I have had some more interaction with the PETSc Team.  The PETSc Team 
currently
believes this is a problem with cusp.  Are you in contact with the PETSc Team?  
It
seems kind of silly for me to be in the middle of this interaction.  Please let 
me
know if there are any plans to look at this problem.

Thanks,

Dave

Original comment by dnystr...@comcast.net on 20 Dec 2011 at 4:45

GoogleCodeExporter commented 8 years ago
Hi Dave,

The best way to handle these situations is to reduce the problem down to a 
minimal program (usually < 100 lines of code) that (1) reproduces the bug and 
(2) is free of external dependencies (i.e. only depends on Thrust/Cusp and 
nvcc).  With such an example in hand we can usually resolve the underlying 
problem quickly.

Original comment by wnbell on 20 Dec 2011 at 5:02

GoogleCodeExporter commented 8 years ago
Do you have any regression tests or other tests which lead you to believe that 
sacusppoly works fine in a cusp/thrust only environment?  At this point in time,
I am only using cusp through the petsc interface.  While there might be 
occasion for me to use it without the petsc interface in the future, that 
scenario does not seem near term.

BTW, I still do not have an answer to the question in my previous post so I 
will ask it again.  Are you, the Cusp Team, in contact with the PETSc Team on 
this issue?

Thanks,

Dave

Original comment by dnystr...@comcast.net on 20 Dec 2011 at 6:01

GoogleCodeExporter commented 8 years ago
Here's the backtrace. Looks to be an uninitialized size() but I still don't 
know where.

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  invalid device pointer

Program received signal SIGABRT, Aborted.
0x00002aaabe742265 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00002aaabe742265 in raise () from /lib64/libc.so.6
#1  0x00002aaabe743d10 in abort () from /lib64/libc.so.6
#2  0x00002aaabe4c9305 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../gcc-4.4.2/libstdc++-v3/libsupc++/vterminate.cc:93
#3  0x00002aaabe4c7736 in __cxxabiv1::__terminate (handler=0x4aff)
    at ../../../../gcc-4.4.2/libstdc++-v3/libsupc++/eh_terminate.cc:38
#4  0x00002aaabe4c7763 in std::terminate ()
    at ../../../../gcc-4.4.2/libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00002aaabe4c785e in __cxxabiv1::__cxa_throw (obj=<value optimized out>, 
    tinfo=<value optimized out>, dest=<value optimized out>)
    at ../../../../gcc-4.4.2/libstdc++-v3/libsupc++/eh_throw.cc:83
#6  0x00002aaaaaef7140 in void 
thrust::detail::backend::cuda::free<0u>(thrust::device_ptr<void>) ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#7  0x00002aaaaaef716d in void 
thrust::detail::backend::dispatch::free<0u>(thrust::device_ptr<void>, 
thrust::detail::cuda_device_space_tag) ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#8  0x00002aaaaaef7498 in thrust::detail::contiguous_storage<double, 
thrust::device_malloc_allocator<double> >::deallocate() ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#9  0x00002aaaaaef74c9 in thrust::detail::contiguous_storage<double, 
thrust::device_malloc_allocator<double> >::~contiguous_storage() ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#10 0x00002aaaaaf0e718 in thrust::detail::vector_base<double, 
thrust::device_malloc_allocator<double---Type <return> to continue, or q 
<return> to quit---
> >::~vector_base() ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#11 0x00002aaaab3b4534 in cusp::precond::smoothed_aggregation<int, double, 
thrust::detail::cuda_device_space_tag>::level::~level() ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#12 0x00002aaaab3e79ca in PCDestroy_SACUSPPoly(_p_PC*) ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#13 0x00002aaaab37664d in PCDestroy ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#14 0x00002aaaab1eb366 in KSPDestroy ()
   from /global/homes/f/filipe/src/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/lib/libpetsc.so
#15 0x0000000000402770 in main (argc=14, args=0x7fffffffd408) at ex2.c:227
(gdb) 

Original comment by filipe.c...@gmail.com on 21 Feb 2012 at 9:00