guagua-pamcn / a-dda

Automatically exported from code.google.com/p/a-dda
0 stars 0 forks source link

Limit on dipole number due to int in MPI calls #137

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
All MPI functions used by ADDA take n_elem arguments (number of elements to 
exchange) as int. This probably limits the number of (non-void) dipoles usable 
by ADDA to INT_MAX, which is on most 32 and 64-bit systems is 2*10^9. There are 
two things to do:

1) Look at MPI calls in more details and test for possible int overflows. Then 
add error checks, so that ADDA would produce a meaningful error message instead 
of wrong results.

2) Try to switch to either derived datatypes or to MPI functions (working with 
MPI_Aint), designed specially for huge data arrays. This should remove or 
alleviate the limit.

Original issue reported on code.google.com by yurkin on 23 Nov 2011 at 6:34

GoogleCodeExporter commented 8 years ago
The problem is not that bad:

1) requirement of nvoid_Ndip to be <= INT_MAX only exists for radiation forces 
and is a consequence of current inefficient implementation. This implementation 
requires at least 72*nvoid_Ndip bytes on a ROOT process, which is still quite a 
lot.

2) another optional limitation is 2*boxXY <= INT_MAX only when WKB initial 
field is used (should not be a problem).

3) The main current limitation (always present) is that in block transpose for 
12*local_Ndip/number_of_processors <= INT_MAX  (or 16*..., for 
-no_reduced_FFT). This also should not be a problem for some time (since 
currently such huge simulations can only be performed using a large number of 
processors).

Additionally, r adds error checks, which addresses the first part of the issue. 
The second part is a part of issue 20.

Original comment by yurkin on 16 May 2012 at 1:53