microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.5k stars 4.29k forks source link

Train ResNet #3767

Open jieunko opened 4 years ago

jieunko commented 4 years ago

I am running ResNet example however .model is not generated when I run TrainResNet_CIFAR10_Distributed.py Is there any required setting to run this code?

akamotaco commented 4 years ago

l have tested it with a single GPU (CNTK 2.7 and Python 3.6.9). The model file generated after the end (or abort) of the training in ".\Models" path.

Check these, please.

  1. 3 files (resnet_models.py / TrainResNet_CIFAR10.py / TrainResNet_CIFAR10_Distributed.py)
  2. Getting the data (CIFAR 10)
  3. Check (or change) data path (default path is '......\DataSet\CIFAR-10.)
  4. Check mpiexec.exe file
  5. Run training (".\Models" path generated)
  6. Check the model file. (.\Models\ResNet_CIFAR10_DataAug.model)
> mpiexec.exe -n 1 python TrainResNet_CIFAR10_Distributed.py -n resnet20 -q 1 -a 50000

ping [requestnodes (before change)]: 1 nodes pinging each other
ping [requestnodes (after change)]: 1 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 1 out of 1 MPI nodes on a single host (1 requested); we (0) are in (participating)
ping [mpihelper]: 1 nodes pinging each other
-------------------------------------------------------------------
Build info:

                Built time: Apr 23 2019 21:50:08
                Last modified date: Tue Apr 23 17:37:55 2019
                Build type: Release
                Build target: GPU
                With ASGD: yes
                Math lib: mkl
                CUDA version: 10.0.0
                CUDNN version: 7.3.1
                Build Branch: HEAD
                Build SHA1: ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Selected GPU[0] GeForce GTX 1070 Ti as the process wide default device.
(Aborting by Ctrl + c)
mpiexec aborting job...

job aborted:
[ranks] message

[0] job terminated by the user
...
jieunko commented 4 years ago

Thanks I didnt have mpiexec.exe