Open uprush opened 6 months ago
Use RDMAV_FORK_SAFE=1 ./benchmark.sh run --workload unet3d --num-accelerators 8 --results-dir /mnt/fb1/unet3d_results --param dataset.data_folder=/mnt/fb1/unet3d_data --param dataset.num_subfolders_train=16 --param dataset.num_files_train=4687
Hi,
The benchmark failed with the following error when running on Kubernetes. I was able to workaround it by setting environment variable
RDMAV_FORK_SAFE=0
, but not sure whether there is any performance impact and other issues.