Closed nyue closed 8 months ago
I don't know the exact MPI command to get this working, but I can confirm that it does.
It will be something similar to the documentation in MPI Offload Rendering using the split launch:
mpirun -n 1 ./ospExamples --osp:load-modules=mpi --osp:device=mpiOffload : -n <N> ./ospray_mpi_worker
But the --osp
arguments won't be passed along ospray this way.
@carsonbrownlee or @Twinklebear can likely help here.
Above should work (replace ospExamples
by usdview
), because the command line args are passed to ospInit
.
My main binary (ospExamples/usdview) is running on the host head0
.
My ospray_mpi_worker is running on the host compute0
How do I include this compute0
hostname information so that the interactive code runs on head0
while the MPI backend runs on a cluster with the main machine being compute0
?
Cheers
hi nyue, the above mentioned command line uses a mpi split launch, the portion after the colon is run on another mpi process. However, the commandline arguments presented won't work for usdview. You can specify ospray specific arguments through the HDOSPRAY_INIT_ARGS env var, or use specific env vars which I show below.
The exact command to run may depend on your mpi implementation, but may look like:
OSPRAY_LOAD_MODULES=mpi OSPRAY_DEVICE=mpiOffload mpirun -n 1 -host head0 -ppn 1 usdview --renderer OSPRay <path to USD file> : -n 1 -host compute0 <path to binary directory of ospray install>/ospray_mpi_worker
I am revisiting mpi and hdospray, how can I debug further ?
env OSPRAY_DEVICE=mpiOffload OSPRAY_LOAD_MODULES=mpi PYTHONPATH=/opt/hdospray/lib/python LD_LIBRARY_PATH=/opt/hdospray/lib mpirun -hosts 192.168.0.16,192.168.0.16 -n 1 -ppn 1 /opt/hdospray/bin/usdview --renderer OSPRay ~/temp/USD/Kitchen_set/Kitchen_set.usd : -n 1 /opt/hdospray/bin/ospray_mpi_worker
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 7533 RUNNING AT 192.168.0.16
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Running with verbose flag generated more diagnostics but did not help me pin point the source of the problem.
[mpiexec@head0] Launch arguments: /usr/bin/hydra_pmi_proxy --control-port 192.168.0.16:34617 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[proxy:0:0@head0] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0@head0] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@head0] got pmi command (from 0): get_maxes
[proxy:0:0@head0] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@head0] got pmi command (from 0): get_appnum
[proxy:0:0@head0] PMI response: cmd=appnum appnum=0
[proxy:0:0@head0] got pmi command (from 0): get_my_kvsname
[proxy:0:0@head0] PMI response: cmd=my_kvsname kvsname=kvs_10185_0
[proxy:0:0@head0] got pmi command (from 0): get_my_kvsname
[proxy:0:0@head0] PMI response: cmd=my_kvsname kvsname=kvs_10185_0
[proxy:0:0@head0] got pmi command (from 0): barrier_in
[proxy:0:0@head0] forwarding command (cmd=barrier_in) upstream
[mpiexec@head0] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@head0] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:0@head0] PMI response: cmd=barrier_out
[proxy:0:0@head0] got pmi command (from 0): put
kvsname=kvs_10185_0 key=P0-businesscard value=description#192.168.0.16$port#49241$ifname#192.168.0.16$
[proxy:0:0@head0] cached command: P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$
[proxy:0:0@head0] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@head0] got pmi command (from 0): barrier_in
[proxy:0:0@head0] flushing 1 put command(s) out
[proxy:0:0@head0] forwarding command (cmd=put P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$) upstream
[proxy:0:0@head0] forwarding command (cmd=barrier_in) upstream
[mpiexec@head0] [pgid: 0] got PMI command: cmd=put P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$
[mpiexec@head0] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@head0] PMI response to fd 6 pid 0: cmd=keyval_cache P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$
[mpiexec@head0] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:0@head0] PMI response: cmd=barrier_out
FATAL ERROR DURING INITIALIZATION!
I would make sure your launch command works with ospray itself first. Try the same command with ospExamples. Typically the first host would be localhost where the gui should be launched but beyond that it looks good to me. MPI can be tricky to debug. A typical method I use is to use "xterm -e gdb --args ..." for each process.
Yeah, I'd second @carsonbrownlee 's recommendation to try this commandline with ospExamples
to check if it's an issue in the environment or MPI stack on the system, or something in OSPRay. The diagnostics output seem to point to an issue during the MPI setup?
One thing that might be worth checking, which I'm not quite sure how MPI is going to be handling it. In your command:
mpirun -hosts 192.168.0.16,192.168.0.16 -n 1 -ppn 1
You list the same host IP twice and specify 1 process per node, I'm not sure if MPI will see this as the same node or two nodes (since it's in the host list twice). That might cause the error? You could try listing the IP once and passing -ppn 2
to indicate you want both processes run on this host.
closing issue as original poster didn't follow up.
I have built and use hdOSPray to load and view the kitchen example on a local box.
I have use OSPRay+MPI indirectly via Paraview+pvserver (remote rendering)
How can I use usdview with hdOSPRay but with OSPRay running on a cluster of MPI boxes ?
Cheers