marvel-nccr / quantum-mobile

A Virtual Machine for computational materials science
https://quantum-mobile.readthedocs.io
Other
91 stars 32 forks source link

running MPI jobs of QM ova on AWS instance #76

Closed EladSegev closed 6 years ago

EladSegev commented 6 years ago

Hi,

I'm trying to run the VM on an AWS machine and when running pw.x with mpi I get this at the start of the output file:

[[10382,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces:

Module: OpenFabrics (openib) Host: qmobile

Another transport will be used instead, although this may result in lower performance.

 Program PWSCF v.6.2 starts on 17Jul2018 at 10: 3:13

What do you suggest to do in order to fix this and fully utilize the MPI for the parallelization?

Thanks, Elad

ltalirz commented 6 years ago

Hi Elad,

I'm not an expert in this but looking here it seems this may indicate that openmpi is not finding any infiniband interfaces - which probably simply aren't there in the AWS machine. Anyhow, I guess you are using QE just on a single AWS VM (?), in which case MPI communication is just between processes of the same (virtual) machine.

Finally, please note that, while I will happily incorporate any performance improvements I come across, running the software on the bare hardware & on supercomputers will always be faster than running on a virtual QM machine.

Closing this for the moment, happy to reopen in case of new info

ltalirz commented 6 years ago

@EladSegev P.S. Very happy to hear you've managed to get it running on AWS on your own - given that this is starting to happen now, I've decided to make our slightly modified AWS/OpenStack playbook repo public as well: https://github.com/materialscloud-org/ansible-playbook-workhorse

EladSegev commented 6 years ago

Yes you just have to import the ova file to create an AMI and then choose it when you launch a machine.

On 17 Jul 2018, at 12:59, Leopold Talirz notifications@github.com wrote:

@EladSegev P.S. Very happy to hear you've managed to get it running on AWS on your own - given that this is starting to happen now, I've decided to make our slightly modified AWS/OpenStack playbook repo public as well: https://github.com/materialscloud-org/ansible-playbook-workhorse

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ltalirz commented 6 years ago

Ah, I see... in this case, the issue you noticed may also come from the fact that the OVA is a virtualbox VM, which might have custom network interfaces, kernel extensions etc. I recommend you build Quantum Mobile directly on AWS using the repo linked above.