Closed gaowayne closed 3 months ago
guys, I installed nvidia docker in fedora, now I can start container, but when I run next step it shows me error like below. how to fix this?
root@6ec7b9c99e06:/# ls
bin boot data dev etc home lib lib64 media mnt opt proc raw_data results root run sbin srv sys tmp usr var workspace
root@6ec7b9c99e06:/# cd workspace/unet3d/
root@6ec7b9c99e06:/workspace/unet3d# python3 preprocess_dataset.py --data_dir /raw_data --results_dir /data
Preprocessing /raw_data
/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/lib/python3.8/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
Mean value: nan, std: nan, d: nan, h: nan, w: nan
Traceback (most recent call last):
File "preprocess_dataset.py", line 147, in <module>
verify_dataset(args.results_dir)
File "preprocess_dataset.py", line 127, in verify_dataset
assert len(source) == len(os.listdir(results_dir))
AssertionError
root@6ec7b9c99e06:/workspace/unet3d#
guys, I install host OS with Ubuntun22.04, I still see this error, could you please shed some light?
dcg@oq1:/mnt/nvme1n1/mlperf/ubuntu/training/image_segmentation/pytorch$ sudo docker run --ipc=host -it --rm --runtime=nvidia -v /mnt/nvme1n1/mlperf/ubuntu/training/image_segmentation/pytorch/raw-data-dir:/raw_data -v /mnt/nvme1n1/mlperf/ubuntu/training/image_segmentation/pytorch/data:/data -v /mnt/nvme1n1/mlperf/ubuntu/training/image_segmentation/pytorch/results:/results unet3d:latest /bin/bash
root@7f2d8fc3d617:/workspace/unet3d# ls
Dockerfile LICENCE README.md checksum.json data_loading evaluation_cases.txt main.py model oldREADME.md preprocess_dataset.py requirements.txt run_and_time.sh runtime
root@7f2d8fc3d617:/workspace/unet3d# python3 preprocess_dataset.py --data_dir /raw_data --results_dir /data
Preprocessing /raw_data
/opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/lib/python3.8/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
Mean value: nan, std: nan, d: nan, h: nan, w: nan
Traceback (most recent call last):
File "preprocess_dataset.py", line 147, in <module>
verify_dataset(args.results_dir)
File "preprocess_dataset.py", line 127, in verify_dataset
assert len(source) == len(os.listdir(results_dir))
AssertionError
root@7f2d8fc3d617:/workspace/unet3d#
Sorry but the unet3d benchmark is dropped from the training benchmarks suite so this issue cannot be addressed at this time.
the guide link is image_segmentation/pytorch
when I try to run the container, I got below error, mention the runtime nvidia does not exist. could you please shed some light?
I am using FedoraOS37, I failed to install cuda container support because this scripts does not support FedoraOS