Closed horstsoft0815 closed 4 weeks ago
@horstsoft0815 thanks for bringing this up.
Do you have a reproducer to your problem?
Sorry it took me a while, this example should reproduce the issue. The CSR matrix is separated in the index vectors col_csr_dump.dat, colPtr_csr_dump.dat, and the data in val_csr_dump.dat. It is then loaded into the example hypre_interp.cpp. I tried to build the example in a similar fashion as used in our software. We have a pre-existing matrix, and dont want to copy it, so I hand the important ptrs into the regarding hypre structures and set the flags to non-owning. hypre_example.zip
@horstsoft0815 which software are your working on? Is it supposed to work in parallel (MPI)?
The software is a commercial prototype of a charging simulation for semiconductor industry. The bug was encountered without MPI/CUDA, despite HYPRE was compiled with CUDA support:
set(HYPRE_ENABLE_SHARED OFF)
set(HYPRE_ENABLE_BIGINT OFF)
set(HYPRE_ENABLE_MIXEDINT OFF)
set(HYPRE_ENABLE_SINGLE OFF)
set(HYPRE_ENABLE_LONG_DOUBLE OFF)
set(HYPRE_ENABLE_COMPLEX OFF)
set(HYPRE_ENABLE_HYPRE_BLAS ON)
set(HYPRE_ENABLE_HYPRE_LAPACK ON)
set(HYPRE_ENABLE_PERSISTENT_COMM OFF)
set(HYPRE_ENABLE_FEI OFF)
set(HYPRE_WITH_MPI OFF)
set(HYPRE_WITH_OPENMP OFF)
set(HYPRE_WITH_HOPSCOTCH OFF)
set(HYPRE_USING_DSUPERLU OFF)
set(HYPRE_USING_MAGMA OFF)
set(HYPRE_WITH_CALIPER OFF)
set(HYPRE_PRINT_ERRORS OFF)
set(HYPRE_TIMING OFF)
set(HYPRE_BUILD_EXAMPLES OFF)
set(HYPRE_BUILD_TESTS OFF)
set(HYPRE_USING_HOST_MEMORY OFF)
set(HYPRE_WITH_CUDA ON)
set(HYPRE_WITH_SYCL OFF)
set(HYPRE_ENABLE_UNIFIED_MEMORY OFF)
set(HYPRE_ENABLE_CUDA_STREAMS ON)
set(HYPRE_ENABLE_CUSPARSE ON)
set(HYPRE_ENABLE_DEVICE_POOL OFF)
set(HYPRE_ENABLE_CUBLAS ON)
set(HYPRE_ENABLE_CURAND ON)
set(HYPRE_ENABLE_GPU_PROFILING OFF)
set(HYPRE_ENABLE_ONEMKLSPARSE OFF)
set(HYPRE_ENABLE_ONEMKLBLAS OFF)
set(HYPRE_ENABLE_ONEMKLRAND OFF)
set(HYPRE_WITH_UMPIRE OFF)
set(HYPRE_WITH_UMPIRE_HOST OFF)
set(HYPRE_WITH_UMPIRE_DEVICE OFF)
set(HYPRE_WITH_UMPIRE_UM OFF)
set(HYPRE_WITH_UMPIRE_PINNED OFF)
Thanks! Does the bug happen when compiling hypre without CUDA support?
Yes it happens also when hypre is compiled without CUDA support.
@horstsoft0815 there is something wrong with your conversion routines. I've updated your code to use the IJ interface for setting matrices and vectors (that's what we recommend doing). You can find the new implementation here: issue1133_solution.tar.gz
Please, let me know if this issue can be closed
Here's the output about the preconditioner:
$ ./hypre_interp_v2
Num MPI tasks = 1
Num OpenMP threads = 1
BoomerAMG SETUP PARAMETERS:
Max levels = 25
Num levels = 6
Strength Threshold = 0.250000
Interpolation Truncation Factor = 0.000000
Maximum Row Sum Threshold for Dependency Weakening = 0.900000
Coarsening Type = PMIS
measures are determined locally
No global partition option chosen.
Operator Matrix Information:
nonzero entries/row row sums
lev rows entries sparse min max avg min max
======================================================================
0 2312 15062 0.003 4 7 6.5 -2.820e-14 1.204e+01
1 891 23539 0.030 5 66 26.4 -2.191e-14 1.403e+01
2 232 11524 0.214 6 87 49.7 1.507e-03 1.430e+01
3 60 1888 0.524 3 48 31.5 7.857e-01 1.398e+01
4 18 298 0.920 12 18 16.6 4.798e+00 1.915e+01
5 3 9 1.000 3 3 3.0 8.657e+00 2.148e+01
Interpolation Matrix Information:
entries/row min max row sums
lev rows x cols min max avgW weight weight min max
================================================================================
0 2312 x 891 1 11 4.0 1.086e-02 8.649e-01 5.758e-01 1.000e+00
1 891 x 232 1 13 5.2 5.073e-03 8.866e-01 4.117e-01 1.000e+00
2 232 x 60 1 15 5.9 4.015e-03 8.868e-01 2.632e-01 1.000e+00
3 60 x 18 1 9 4.8 2.704e-03 4.377e-01 1.774e-01 1.000e+00
4 18 x 3 0 3 1.4 4.017e-03 1.286e-01 0.000e+00 1.000e+00
Complexity: grid = 1.520761
operator = 3.473642
memory = 4.240274
BoomerAMG SOLVER PARAMETERS:
Maximum number of cycles: 1
Stopping Tolerance: 0.000000e+00
Cycle type (1 = V, 2 = W, etc.): 1
Relaxation Parameters:
Visiting Grid: down up coarse
Number of sweeps: 1 1 1
Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 18 18 9
Point types, partial sweeps (1=C, -1=F):
Pre-CG relaxation (down): 0
Post-CG relaxation (up): 0
Coarsest grid: 0
HYPRE_BiCGSTABGetPrecond got good preconditioner object
@victorapm Thanks for your fix. The problem is that a matrix copy should be avoided by any means. So it would be great to find the original code in my bug when using the lower case "hyper_" variants. Is there any obvious problem with my version?
@horstsoft0815 sorry, but I can't see any obvious problem.
@horstsoft0815 I hope you have been able to find the issue in your code. I'm closing this for now since we found hypre_ParCSRMatrixGenerateFFFCHost
is behaving correctly. Feel free to reach out if you have other issues
For BoomerAMG with Extended+i Interpolation, the routine hypre_ParCSRMatrixGenerateFFFCHost called within hypre_BoomerAMGBuildModExtPEInterpHost at line par_mod_lr_interp.c:1366 leads to spurious values of "-1" in an index structure for "As_FF_diag" (par_mod_lr_interp.c:1374). This later causes a crash when hypre_ParMatmul is called at line par_mod_lr_interp.c:1661.
The "-1" values result from line gen_fffc.c:155, where they are first written to the array "fine_to_fine" and later written without check (only "CF_marker" is checked) into "A_FF_diag_j" at line gen_fffc.c:385. The problem is that in line gen_fffc.c:155 CF_marker[i] is checked (which is related to the sign of fine_to_fine[i]), but the value of fine_to_fine[A_diag_j[jA]] is accessed.