Closed tygamvrelis closed 5 years ago
When I run resistivity_profile with 16 procs, it takes around 30 to 40 minutes to run up to t = 30.0, which is much faster than before (on the order of several hours).
So far, I've noticed one issue: bg_field.dat and eta_field.dat only contain 2700 lines, instead of the 43200 lines that are expected. 43200 / 2700 = 16, which suggests to me that the code I've written to dump these fields isn't compatible with multiple processors.
Edit: upon experimenting a bit further, it is definitely the case that each processor runs the analysis loop independently. Will have to see if there's any notes about coordinating them for the analysis function in the user manual
I'm seeing this code example in the user manual:
#ifdef PARALLEL
MPI_Allreduce (&Ekin, &scrh, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
Ekin = scrh;
MPI_Allreduce (&Eth_max, &scrh, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD);
Eth_max = scrh;
MPI_Barrier (MPI_COMM_WORLD);
#endif
Will have to look into these MPI_* functions as a way to achieve process synchronization.
Alternatively, we can output a different file for each processor, corresponding to the field values on its domain, and reassemble in Python after. This would require knowing how the domain is partitioned among the processors
One idea I have for solving this issue of parallel output is as follows:
Since user variables, such as current, seem to be handled correctly in parallel, I can add 6 new user variables: B0x1, B0x2, B0x3, etax1, etax2, etax3.
When entering the compute user output function, these fields will be populated and dumped automatically. Then I will disable dumping for them. This means that the first output file will have the B0 and eta fields while the others will not. This is to save space and memory when running the Python script (memory usage is already quite high when 100+ output files are loaded, we don’t need redundant field data worsening that)
For heating, I will use MPI to sum the heating calculations from all the processors and will write the output from proc 0. This would use code similar to the example in my previous comment
Only proc 0 will write the files for user parameters and constants.
The Python code will need to be updated so that it extracts the B0 and eta fields from the first output file.
I have the approach above working on a branch, with one modification. It turns out that the pyPLUTO library is not able to load several dataframes when their sizes differ. Therefore in the interest of simplicity, all the B0x and etax fields will be dumped in every output file.
If, in the future, it becomes important to save space/memory when working with these data files, note that the following C code can be used to disable the dumping of a user variable on-the-fly:
SetDumpVar("B0x1", DBL_OUTPUT, NO);
Use the PLUTO python utility to change the makefile to Linux mpigcc. This will cause a parallel version of the code to be built.
Also add this to the job submission file: module load gcc/7.3.0 openmpi/3.0.0-gcc-7.3.0