I want to train VGG16_ImageNet_Distributed.py at multiple node using mpiexec (two gpu on one node)
so, I followed instructions in https://docs.microsoft.com/en-us/cognitive-toolkit/Multiple-GPUs-and-machines.
When I was training on one node, it worked well. However, it did not work when I trained on multiple node.
An error occurred that does not import module numpy or cntk at other node.
Since I train through anaconda, i changed default python path to anaconda. But it did not solve the problem..
Is there a solution to this problem??
the errors are as follows :
**Traceback (most recent call last):
File "/home/cslee/cntk/Examples/Image/Classification/VGG/Python/VGG16_modify_2.py", line 11, in
import numpy as np
ImportError: No module named numpy
Traceback (most recent call last):
File "/home/cslee/cntk/Examples/Image/Classification/VGG/Python/VGG16_modify_2.py", line 11, in
import numpy as np
ImportError: No module named numpy
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
It is available without activating a special environment, because, when mpid invokes python process, it is run on the default conda environment, not in a specific cntk environment.
I want to train VGG16_ImageNet_Distributed.py at multiple node using mpiexec (two gpu on one node) so, I followed instructions in https://docs.microsoft.com/en-us/cognitive-toolkit/Multiple-GPUs-and-machines. When I was training on one node, it worked well. However, it did not work when I trained on multiple node. An error occurred that does not import module numpy or cntk at other node. Since I train through anaconda, i changed default python path to anaconda. But it did not solve the problem.. Is there a solution to this problem??
the errors are as follows : **Traceback (most recent call last): File "/home/cslee/cntk/Examples/Image/Classification/VGG/Python/VGG16_modify_2.py", line 11, in
import numpy as np
ImportError: No module named numpy
Traceback (most recent call last):
File "/home/cslee/cntk/Examples/Image/Classification/VGG/Python/VGG16_modify_2.py", line 11, in
import numpy as np
ImportError: No module named numpy
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[62073,1],2] Exit code: 1**