yt-project / libyt

In-situ analysis with yt
https://libyt.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Need `OMPI_MCA_osc=sm,pt2pt` when using libyt #86

Open cindytsai opened 1 year ago

cindytsai commented 1 year ago

Need OMPI_MCA_osc=sm,pt2pt when using libyt

At the time I was developing features related to using RMA (remote memory access), all I care is make it work on HPC system and haven't thought much about why do we need this parameter so that it can run on HPC. We don't need this on single machine, ex: my laptop.

TODO

Problems

When do we need this?

Attaching same pointer multiple times

When I was testing particle array using example like this:

int temp[0] = {myrank};
grids_local[index_local].particle_data[0][3].data_ptr = temp;

I get error:

[xps:25522] *** An error occurred in MPI_Win_attach
[xps:25522] *** reported by process [3353411585,1]
[xps:25522] *** on win rdma window 3
[xps:25522] *** MPI_ERR_RMA_ATTACH: Could not attach RMA segment
[xps:25522] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[xps:25522] ***    and potentially your MPI job)
[xps:25513] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[xps:25513] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

This is probably caused by attaching same data to windows. But it is strange that it can be fixed by using

OMPI_MCA_osc=sm,pt2pt mpirun -np 4 ./example

When running on Taiwania 3 and Eureka

Needs to add:

OMPI_MCA_osc=sm,pt2pt mpirun -np 4 ./example

Otherwise I get error:

(something related to attaching...)