ospray / hdospray

Rendering plugin for Pixar's USD Hydra
Apache License 2.0
116 stars 14 forks source link

usdview with OSPray+MPI #14

Closed nyue closed 8 months ago

nyue commented 2 years ago

I have built and use hdOSPray to load and view the kitchen example on a local box.

I have use OSPRay+MPI indirectly via Paraview+pvserver (remote rendering)

How can I use usdview with hdOSPRay but with OSPRay running on a cluster of MPI boxes ?

Cheers

BruceCherniak commented 2 years ago

I don't know the exact MPI command to get this working, but I can confirm that it does.

It will be something similar to the documentation in MPI Offload Rendering using the split launch:

mpirun -n 1 ./ospExamples --osp:load-modules=mpi --osp:device=mpiOffload : -n <N> ./ospray_mpi_worker

But the --osp arguments won't be passed along ospray this way.

@carsonbrownlee or @Twinklebear can likely help here.

johguenther commented 2 years ago

Above should work (replace ospExamples by usdview), because the command line args are passed to ospInit.

nyue commented 2 years ago

My main binary (ospExamples/usdview) is running on the host head0.

My ospray_mpi_worker is running on the host compute0

How do I include this compute0 hostname information so that the interactive code runs on head0 while the MPI backend runs on a cluster with the main machine being compute0 ?

Cheers

carsonbrownlee commented 2 years ago

hi nyue, the above mentioned command line uses a mpi split launch, the portion after the colon is run on another mpi process. However, the commandline arguments presented won't work for usdview. You can specify ospray specific arguments through the HDOSPRAY_INIT_ARGS env var, or use specific env vars which I show below.

The exact command to run may depend on your mpi implementation, but may look like: OSPRAY_LOAD_MODULES=mpi OSPRAY_DEVICE=mpiOffload mpirun -n 1 -host head0 -ppn 1 usdview --renderer OSPRay <path to USD file> : -n 1 -host compute0 <path to binary directory of ospray install>/ospray_mpi_worker

nyue commented 2 years ago

I am revisiting mpi and hdospray, how can I debug further ?

env OSPRAY_DEVICE=mpiOffload OSPRAY_LOAD_MODULES=mpi PYTHONPATH=/opt/hdospray/lib/python LD_LIBRARY_PATH=/opt/hdospray/lib mpirun -hosts 192.168.0.16,192.168.0.16 -n 1 -ppn 1 /opt/hdospray/bin/usdview --renderer OSPRay ~/temp/USD/Kitchen_set/Kitchen_set.usd : -n 1 /opt/hdospray/bin/ospray_mpi_worker
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 7533 RUNNING AT 192.168.0.16
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Running with verbose flag generated more diagnostics but did not help me pin point the source of the problem.

[mpiexec@head0] Launch arguments: /usr/bin/hydra_pmi_proxy --control-port 192.168.0.16:34617 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0 
[proxy:0:0@head0] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@head0] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@head0] got pmi command (from 0): get_maxes

[proxy:0:0@head0] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@head0] got pmi command (from 0): get_appnum

[proxy:0:0@head0] PMI response: cmd=appnum appnum=0
[proxy:0:0@head0] got pmi command (from 0): get_my_kvsname

[proxy:0:0@head0] PMI response: cmd=my_kvsname kvsname=kvs_10185_0
[proxy:0:0@head0] got pmi command (from 0): get_my_kvsname

[proxy:0:0@head0] PMI response: cmd=my_kvsname kvsname=kvs_10185_0
[proxy:0:0@head0] got pmi command (from 0): barrier_in

[proxy:0:0@head0] forwarding command (cmd=barrier_in) upstream
[mpiexec@head0] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@head0] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:0@head0] PMI response: cmd=barrier_out
[proxy:0:0@head0] got pmi command (from 0): put
kvsname=kvs_10185_0 key=P0-businesscard value=description#192.168.0.16$port#49241$ifname#192.168.0.16$ 
[proxy:0:0@head0] cached command: P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$
[proxy:0:0@head0] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@head0] got pmi command (from 0): barrier_in

[proxy:0:0@head0] flushing 1 put command(s) out
[proxy:0:0@head0] forwarding command (cmd=put P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$) upstream
[proxy:0:0@head0] forwarding command (cmd=barrier_in) upstream
[mpiexec@head0] [pgid: 0] got PMI command: cmd=put P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$
[mpiexec@head0] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@head0] PMI response to fd 6 pid 0: cmd=keyval_cache P0-businesscard=description#192.168.0.16$port#49241$ifname#192.168.0.16$ 
[mpiexec@head0] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:0@head0] PMI response: cmd=barrier_out
FATAL ERROR DURING INITIALIZATION!
carsonbrownlee commented 2 years ago

I would make sure your launch command works with ospray itself first. Try the same command with ospExamples. Typically the first host would be localhost where the gui should be launched but beyond that it looks good to me. MPI can be tricky to debug. A typical method I use is to use "xterm -e gdb --args ..." for each process.

Twinklebear commented 2 years ago

Yeah, I'd second @carsonbrownlee 's recommendation to try this commandline with ospExamples to check if it's an issue in the environment or MPI stack on the system, or something in OSPRay. The diagnostics output seem to point to an issue during the MPI setup?

One thing that might be worth checking, which I'm not quite sure how MPI is going to be handling it. In your command:

mpirun -hosts 192.168.0.16,192.168.0.16 -n 1 -ppn 1

You list the same host IP twice and specify 1 process per node, I'm not sure if MPI will see this as the same node or two nodes (since it's in the host list twice). That might cause the error? You could try listing the IP once and passing -ppn 2 to indicate you want both processes run on this host.

carsonbrownlee commented 8 months ago

closing issue as original poster didn't follow up.