Open fkxie opened 5 years ago
Hi,
have you got NVIDIA drivers installed in your host machine, with a version high enough to be compatible with CUDA 9.0?
Do you see the GPU if you run this test command?
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Hi, Thanks for your reply. I have Installed nvidia and cuda10.0, ‘docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi’ not work,
but I try ‘docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi’, It shows: NVIDIA-SMI 410.48 Driver Version: 410.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... Off | 00000000:1B:00.0 Off | 0 | | N/A 48C P0 44W / 250W | 11174MiB / 32480MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-PCIE... Off | 00000000:1C:00.0 Off | 0 | | N/A 43C P0 78W / 250W | 11650MiB / 32480MiB | 42% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-PCIE... Off | 00000000:1D:00.0 Off | 0 | | N/A 45C P0 42W / 250W | 10802MiB / 32480MiB | 0% Default |
And I turn on the ‘with_distibut’ as true, but it made no differences.
BTW,it cost about 2.5 seconds per 100 steps testing for the example.
So, my clear problem is how could I know whether the ‘dp_train’ is accelerated by gpu, there’s no information about gpu dumped when training data.But when running lammps in this docker, I could see information about gpu accelerating dumped.
F.K.xie
发信人:Marco De La Pierrenotifications@github.com
收信人:marcodelapierre/md-dockerfilesmd-dockerfiles@noreply.github.com
抄 送:1385750186713857501867@sina.cnAuthorauthor@noreply.github.com
时间:19年05月28日 21:57:22
Hi, have you got NVIDIA drivers installed in your host machine, with a version high enough to be compatible with CUDA 9.0? Do you see the GPU if you run this test command? docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
— You are receiving this because you authored the thread. Reply to this email directly,view it on GitHub(https://github.com/marcodelapierre/md-dockerfiles/issues/1?email_source=notifications&email_token=AKK2O2ZEXIIOLUBZRBENWTTPXU24FA5CNFSM4HPP7V32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMGT6Q#issuecomment-496527866), ormute the thread(https://github.com/notifications/unsubscribe-auth/AKK2O2YD4CXR6MCGWDP2US3PXU24FANCNFSM4HPP7V3Q).
When I run the dp_train with the deepmd GPU container on a Pascal server, I see either
# DEEPMD: gpu per node: [0]
or
# DEEPMD: gpu per node: [0, 1, 2, 3]
depending on whether I am using 1 or 4 GPUs of the server, so your dp_train output seems to suggest you're not seeing the GPUs.
Can you try and use this container that I built out of the Dockerfile in this repo:
marcodelapierre/deepmd-gpu:0.12.4_tf1.8_lmp_yz
and let me know how it goes?
OK,I'll try it
Hi,
marcodelapierre/deepmd-gpu:0.12.4_tf1.8_lmp_yz
is exactly what I use now.
The problem I mentioned above: gpu per node: none
.
But when I try dp_train
, dp_frz
. I can see some information about gpu dumped.
Hi, I want to use dp_train with gpu acceleration. My running command is:
But the output line
# DEEPMD: gpu per node: None
SO, maybe I don't use gpu accelerate correctly. Is there something wrong ? Please correct me.
Thanks.
F.K.xie