pghysels / STRUMPACK

Structured Matrix Package (LBNL)
http://portal.nersc.gov/project/sparse/strumpack/
Other
166 stars 38 forks source link

SparseSolverMPIDist for fortran uses #62

Closed GuoqiMa closed 1 year ago

GuoqiMa commented 2 years ago

Hi Pieter! I am now trying using SparseSolverMPIDist in FORTRAN. I am wondering if STRUMPACK supports that? Since I didn't find a subroutine (Strumpack_set_distributed_csr_matrix) in src /fortran/ strumpack.f90. Does that mean there is no such a interface for FORTRAN users? Thank you.

pghysels commented 2 years ago

Indeed, it looks like those routines are missing. I will try to add them as soon as possible.

GuoqiMa commented 2 years ago

I see, thank you

GuoqiMa commented 1 year ago

@pghysels Hi Pieter, I saw the latest version has been released, but it seems this problem does not fixed. I checked the same code, still not found subroutine (Strumpack_set_distributed_csr_matrix) in src /fortran/ strumpack.f90.

pghysels commented 1 year ago

I'm sorry for the delay. I know now how to handle the MPI case in the Fortran interface. I will try to add the missing routines this week.

GuoqiMa commented 1 year ago

Thanks very much

pghysels commented 1 year ago

I added the missing routines in this commit: https://github.com/pghysels/STRUMPACK/commit/7473559a57afb3a8424f92b0885816615b10159e Can you give that a try? Please let me know if something is still missing.

pghysels commented 1 year ago

Sorry, I forgot to add a file. Please use this commit instead: https://github.com/pghysels/STRUMPACK/commit/586254b570fc70ba609d327b109870ae97ab40e5

GuoqiMa commented 1 year ago

I am sorry to reply late. I will try it as soon as possible and give you feedback

GuoqiMa commented 1 year ago

@pghysels Hi, I change the JA, IA to integer(8), A to complex(4) in fortran, but always get errors like below, I wonder if you know what the problem is?

Initializing STRUMPACK using 52 OpenMP thread(s) number of tasking levels = 8 = log_2(#threads) + 3 matching job: maximum matching with row and column scaling forrtl: severe (174): SIGSEGV, segmentation fault occurred

And I have also changed the data type from Doublecomplex to Floatcomplex.

call STRUMPACK_init_mt(S, STRUMPACK_FloatCOMPLEX, STRUMPACK_MT, 0, c_null_ptr, 1)

pghysels commented 1 year ago

From the output it looks like something might be going wrong in the matching (this is a column permutation to try to get the largest elements on the diagonal). Can you try adding:

call STRUMPACK_set_matching(S, STRUMPACK_MATCHING_NONE);

to disable the matching?

GuoqiMa commented 1 year ago

Hi, Pieter, I tried this call, but still meet segmentation error

_Initializing STRUMPACK using 52 OpenMP thread(s) number of tasking levels = 8 = log2(#threads) + 3 forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred corrupted double-linked list

I just changed the data type from double to single for A, from short integer to long integer for IA, JA. I copy the whole code, can you help me to take a look at it.

subroutine STRUMSEQ(N,A,IA,JA,B) use, intrinsic :: ISO_C_BINDING use strumpack implicit none

integer(c_int) :: k ! grid dimension integer, target :: N ! matrix dimension integer(8) :: nnz ! matrix nonzeros integer :: r, c, i ! row, column, index integer(c_int) :: ierr COMPLEX(kind=4) A(), B()!kind=8 == complex16 _integer(8) :: IA( ), JA( * )_ complex(4),dimension(:), allocatable, target :: x type(STRUMPACK_SparseSolver) :: S call STRUMPACK_init_mt(S, STRUMPACK_FloatCOMPLEX, STRUMPACK_MT, 0, c_null_ptr, 1) call STRUMPACK_set_rel_tol(s, 0.0005) call STRUMPACK_set_abs_tol(s, 1.0D-11) call STRUMPACK_set_gmres_restart(s, 1)

call STRUMPACK_set_compression(S, STRUMPACK_none)

call STRUMPACK_set_compression_leaf_size(S, 64) call STRUMPACK_set_compression_rel_tol(S, dble(1.e-2)) call STRUMPACK_set_compression_min_sep_size(S, int(N*3/4)) allocate(x(N)) x(1:N)=(0.0,0.0) call STRUMPACK_set_csr_matrix(S, c_loc(N), c_loc(IA), c_loc(JA), c_loc(A),1) call STRUMPACK_set_matching(S, STRUMPACK_MATCHING_NONE) call STRUMPACK_set_reordering_method(S, STRUMPACK_METIS)

! !!!!!!!Solve will internally call factor (and reorder if necessary). !!!!!!!!!ierr = STRUMPACK_solve(S, c_loc(B), c_loc(x), 0); B(1:N)=x(1:N) deallocate(x) End subroutine STRUMSEQ

pghysels commented 1 year ago

If you use 8 byte integers you need to specify STRUMPACK_FLOATCOMPLEX_64 instead of STRUMPACK_FloatCOMPLEX. Can you give that a try?

GuoqiMa commented 1 year ago

Yes, thank you very much.

GuoqiMa commented 1 year ago

@pghysels Hi, Pieter, I am still struggling in fortran MPI interface, you mentioned ''Please use this commit instead: https://github.com/pghysels/STRUMPACK/commit/586254b570fc70ba609d327b109870ae97ab40e5'', I have no idea how to use it. If this is used to reconfigure STRUMPACK, I can ask our manager to do it. Can you give more infomation about it, thanks.

pghysels commented 1 year ago

You can just use the master branch, the latest commit.

git clone https://github.com/pghysels/STRUMPACK.git

this will get you the latest version.

GuoqiMa commented 1 year ago

Hi, hope you are doing well. I tried the latest version which has added STRUMPACK_set_distributed_csr_matrix, however, when I compile it, an error is confusing

''[login-3 ]$ mpiifort -O2 -xHost -nofor-main -DBLR_MT -qopenmp -c $filename.f90 -o $filename.o vsc3strum_SPa.f90(4862): error #6633: The type of the actual argument differs from the type of the dummy argument. [IROW] c_loc(A2(left:right)),Irow,1) ---------------------------------^ compilation aborted for vsc3strum_SPa.f90 (code 1)''

The code is

Irow=Iend-Istart
call STRUMPACK_set_distributed_csr_matrix(S, c_loc(locN),   c_loc(IA2(Istart+1:Iend+1)), c_loc(JA2(left:right)),   
                c_loc(A2(left:right)), Irow, 1)

If I was right, the second last input is Interger as I give it, but how the type error comes. Could you please take a look at it. Thanks a lot.

pghysels commented 1 year ago

The second to last argument is dist, and it should be an array of integers. It specifies the block row distribution over the processes. See the dist array in Figure 2 here: https://portal.nersc.gov/project/sparse/strumpack/v7.1.0/sparse_example_usage.html dist should be the same on every processor, it should have P+1 elements, P the total number of MPI ranks, and rank p owning rows [dist[p],dist[p+1]). So dist[0]==0 and dist[P+1]==total_nnz.

GuoqiMa commented 1 year ago

Hi, thanks Pieter. Actually, I had already read it and tried an array Dist(P+1). Now I understand where the problem is.
There is a wrong in Fig 2 in the link you offered, '' If N = 5 is the number of rows in the entire matrix and P is the total number of processes, then dist[P]=N. '', so dist[P+1] should be total N istead of dist[P].

Now using the correct index, I also changed the interface to ''call STRUMPACK_init_mt(S, STRUMPACK_DOUBLECOMPLEX, STRUMPACK_MPI_DIST, 0, c_null_ptr, 1)'', I still get ''ERROR: wrong interface!'', do you know how it comes?

And can you please indicate where is the setting of threads number for STRUMPACK, I set inside code or by environment variable, but it seems not working. I didn't either find any relavent information in documentation.

Appendix: full code

!! distribute data to each processs in advance call STRUMPACK_init_mt(S, STRUMPACK_DOUBLECOMPLEX, STRUMPACK_MPI_DIST, 0, c_null_ptr, 1) call STRUMPACK_set_reordering_method(S, STRUMPACK_METIS) locN=Iend-Istart allocate(x(locN)) x(1:locN)=(0.d0,0.d0) IA2(Istart+1:Iend+1)=IA2(Istart+1:Iend+1)-IA2(Istart+1) JA2(left:right)=JA2(left:right)-1 call STRUMPACK_set_distributed_csr_matrix(S, c_loc(locN), c_loc(IA2(Istart+1:Iend+1)), c_loc(JA2(left:right)), & c_loc(JA2(left:right)), c_loc(A2(left:right)), c_loc(RowS), 1) ierr = STRUMPACK_solve(S, c_loc(B2), c_loc(x), 0); B(1:N)=x(1:N)

pghysels commented 1 year ago

For the distributed code you should use STRUMPACK_init instead of STRUMPACK_init_mt. The second argument to STRUMPACK_init is an MPI communicator.

I will fix the mistake in the documentation.

pghysels commented 1 year ago

Sorry, I was wrong in my comment above, it should be dist[P]=N (not total_nnz), just as in the documentation, and not dist[P+1]=N. Note that I'm counting from 0.

GuoqiMa commented 1 year ago

Thank you so much. Using this
''call STRUMPACK_init(S,MPI_COMM_WORLD, STRUMPACK_DOUBLECOMPLEX, STRUMPACK_MPI_DIST, 0, c_null_ptr, 1)'' now it is working , and recults are correct if I mpirun only 1 process. However, mpirun more than 1 process, I will meet segment error. I have check every input (except last 3 ones) in set_distributed_csr_matrix( ), the data input should be correct. So, except that, I wonder what can possibly incur thhis kind of error below:

Initializing STRUMPACK using 17 OpenMP thread(s)
using 3 MPI processes matching job: maximum matching with row and column scaling ** Warning from MC64A/AD. INFO(1) = 2 Some scaling factors may be too large. WARNING: mc64 scaling produced large scaling factors which may cause overflow! matrix equilibration, r_cond = 1 , c_cond = 1 , type = N Matrix padded with zeros to get symmetric pattern. Number of nonzeros increased from 18,741,393 to 23,638,279. initial matrix:

GuoqiMa commented 1 year ago

And can you please indicate where is the setting of threads number for STRUMPACK, I set inside code or by environment variable, but it seems not working. I didn't either find any relavent information in documentation.

pghysels commented 1 year ago

I see there is this error message:

****** Warning from MC64A/AD. INFO(1) = 2
Some scaling factors may be too large.
WARNING: mc64 scaling produced large scaling factors which may cause overflow!

Do you also see this when running with 1 MPI rank? If not, there is probably something wrong with how you specify the distributed matrix.

To set the number of threads, in bash for instance, just set the following environment variable:

export OMP_NUM_THREADS=4
GuoqiMa commented 1 year ago

Oh, sorry, I have never thought OMP_NUM_THREADS should be strictly captalized.

Yeah, for 1 process, there are no such warnings. I actually distributed matrix following the Fig 2 in the documentation. Do you have any idea of the reason causing such warnings and what info(1)=2 means?

GuoqiMa commented 1 year ago

1 MPI process:

Initializing STRUMPACK using 2 OpenMP thread(s) using 1 MPI processes matching job: maximum matching with row and column scaling matrix equilibration, r_cond = 1 , c_cond = 1 , type = N initial matrix:

pghysels commented 1 year ago

info(1)=2 is explained here: https://github.com/pghysels/STRUMPACK/blob/87b7d7b768ec072e4a9b51d4d5437086a8e90c8f/src/sparse/MC64ad.cpp#LL338C17-L338C17

The MC64 code is used to find a column permutation of the matrix to maximize the diagonal entries. MC64 is also used then to scale the rows/column of the matrix so that the diagonal entries are all 1. The error means that these scaling factors are too small. MC64 is a sequential code, so it needs to gather the matrix to one MPI rank first. So it should behave the same on 1 rank as on multiple ranks. Since you see an error here, I think there is something wrong with your matrix distribution.

I suspect there is an off-by-1 error somewhere, related to mixing Fortran and C/C++.

It shouldn't be necessary, but in order to debug, can you try making sure that row_ptr[0] = 0 on every rank?

GuoqiMa commented 1 year ago

Yes, I have checked row_ptr[0]=0, but the index of each row_ptr starts not from 0. For example, on rank 0, row_ptr[1:3]=[0,1,3]; represent row1 has 1 nnz, row 2 has 2 nnz;
on rank 2, row_ptr[3:6]=[0, 4, 6], .......................... This data is same to that in documentation, but I don't know if the index is allowed or not since documentation doesnot mention.

I also try Call STRUMPACK_set_matching(S, MAX_CARDINALITY), warning is gone, but error is still there

Initializing STRUMPACK using 2 OpenMP thread(s) using 3 MPI processes matching job: none matrix equilibration, r_cond = 1 , c_cond = 1 , type = N initial matrix:

GuoqiMa commented 1 year ago

Also, my matrix is symmetric, so do I just input the half CSR matrix or still full matrix ?

pghysels commented 1 year ago

You still need to input the full matrix.

pghysels commented 1 year ago

Your matrix is quite large. Does it work with the small example from the documentation?

GuoqiMa commented 1 year ago

I didn't try it. maybe tomorrow I will do more tests. Have a nice day.

GuoqiMa commented 1 year ago

Hi, Pieter. I changed the fexample into MPI version. Unfortunately, it is not working. Since there is no any MPI fortran examples, so if you need this for debugging, I can give you this MPI example.

pghysels commented 1 year ago

I will add an MPI example soon.

GuoqiMa commented 1 year ago

I also have an important question regarding the sparse solver. Can it solve multiple Right-hand-side vectors simultaneously?

GuoqiMa commented 1 year ago

Hi, Pieter. I would like to update the new. Now it is working. the problem is truely at my code, where the precision is wrong when distribute.

But I Have another question. The estamated memory usage is only for factors, not for whole strumpack. because I checked the estimated memory is 72 GB, however, the actual usage is 117GB.

pghysels commented 1 year ago

Ah that is excellent news. I'm sorry for the delay in adding a small example code. I am traveling at the moment.

Yes, the estimated memory usage is only for the final LU factorization of the matrix. The peak memory usage is going to more than this estimate, because some temporary work memory is also required during the factorization. Also, the estimate does not include the storage for the sparse matrix and some other meta-data.

pghysels commented 1 year ago

To solve multiple right-hand-sides, you should use STRUMPACK_matsolve