Limit on dipole number due to int in MPI calls

yogevb / a-dda

Automatically exported from code.google.com/p/a-dda

0 stars 0 forks source link

All MPI functions used by ADDA take n_elem arguments (number of elements to exchange) as int. This probably limits the number of (non-void) dipoles usable by ADDA to INT_MAX, which is on most 32 and 64-bit systems is 2*10^9. There are two things to do: 1) Look at MPI calls in more details and test for possible int overflows. Then add error checks, so that ADDA would produce a meaningful error message instead of wrong results. 2) Try to switch to either derived datatypes or to MPI functions (working with MPI_Aint), designed specially for huge data arrays. This should remove or alleviate the limit.

The problem is not that bad:

1) requirement of nvoid_Ndip to be <= INT_MAX only exists for radiation forces 
and is a consequence of current inefficient implementation. This implementation 
requires at least 72*nvoid_Ndip bytes on a ROOT process, which is still quite a 
lot.

2) another optional limitation is 2*boxXY <= INT_MAX only when WKB initial 
field is used (should not be a problem).

3) The main current limitation (always present) is that in block transpose for 
12*local_Ndip/number_of_processors <= INT_MAX  (or 16*..., for 
-no_reduced_FFT). This also should not be a problem for some time (since 
currently such huge simulations can only be performed using a large number of 
processors).

Additionally, r adds error checks, which addresses the first part of the issue. 
The second part is a part of issue 20.

Original comment by yurkin on 16 May 2012 at 1:53

Changed state: Fixed

yogevb / a-dda

Limit on dipole number due to int in MPI calls #137