Open Hiroki11x opened 7 years ago
slayton58は元同僚なのでわからないことがあったら直接コンタクト取れるようにできます
@rioyokota 本当ですか? ResNet50のtrainigがfp16でできるか?(試したことがあるか)聞いてみたいです。
Errorの原因はなんとなくわかっていて、そこよりも 途中まで学習は進むのですがあまり性能が出ない(GPU使いきれていない、fp32より速くはない、SoftmaxWithLossとかがfp16用にまだ実装されていないor公開されていない)のが共通認識か知りたいのでつなげてもらいたいです(githubアカウントわかるので直接コメントしてもいい??)
I can execute multigpu training. However I can't execute single gpu training.
INFO:resnet50_trainer:Training loss: 1.89116716385, accuracy: 0.1875
*** Aborted at 1502868249 (unix time) try "date -d @1502868249" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 72089 (TID 0x16000f9bf1b0) from PID 0; stack trace: ***
@ 0x100000050478 ([vdso]+0x477)
@ 0x10002455b908 (unknown)
@ 0x10002451c6ec (unknown)
@ 0x100024517178 (unknown)
@ 0x100012ebd98c (unknown)
@ 0x100012ce81d4 (unknown)
@ 0x100012d8a1ac (unknown)
@ 0x100012d8c4ec (unknown)
@ 0x100012e54498 (unknown)
@ 0x100012de4e88 (unknown)
@ 0x100012cf11ac (unknown)
@ 0x100012cf26f4 (unknown)
@ 0x100012be0b9c (unknown)
@ 0x100012be1378 (unknown)
@ 0x100012d83204 cuMemcpyAsync
@ 0x10000c48c72c (unknown)
@ 0x10000c461abc (unknown)
@ 0x10000c4a9f88 cudaMemcpyAsync
@ 0x10000a93ccc8 caffe2::CUDAContext::CopyBytes<>()
@ 0x10000a93e948 caffe2::Tensor<>::CopyFrom<>()
@ 0x10000a93f228 caffe2::ImageInputOp<>::CopyPrefetched()
@ 0x10000a93aa6c caffe2::PrefetchOperator<>::Run()
@ 0x100009f51924 caffe2::DAGNet::RunAt()
@ 0x100009f4cee8 caffe2::DAGNetBase::WorkerFunction()
@ 0x100009f51604 std::thread::_Impl<>::_M_run()
@ 0x10000051bdd4 (unknown)
@ 0x1000000b8728 start_thread
@ 0x10000034d210 __clone
when I remove nvprof , training is suceeded. It depends nvprof?
if I use distributed stable version https://github.com/rioyokotalab/caffe2/tree/3a2e09674920fa9ac124a4facd6ef90e4eea1b47
this problem occured.
If I use bellow commit version.
commit c59f29163a15d0ccccb4a77db07f6f1da2757b76 Author: Yangqing Jia Yangqing@users.noreply.github.com Date: Thu Aug 17 00:03:53 2017 -0700
Adios CNMEM. You will be remembered.
Summary:
As part of the cuda 9 move we have decided to deprecate the cnmem path
as it seems to be superceded by cub if one needs a memory pool.
Closes https://github.com/caffe2/caffe2/pull/1104
Differential Revision: D5647672
Pulled By: Yangqing
fbshipit-source-id: 988af5bf63e24efa1b631fd91ddb58e798ffc5c6
is also not stable for this nvprof problem.
https://github.com/slayton58/caffe2/commit/e415b74e439e67c6d5c2a6d1061c516ee3335afa のように従って色々やってみた
I want to run
caffe2/caffe2/python/examples/resnet50_trainer.py
with fp16 using P100.Change
caffe2/caffe2/python/examples/resnet50_trainer.py
as followsadd
output_type='float16'
inbrew.image_input
argumentby using
caffe2.python.modeling.initializers.pFP16Initializer
addpFP16Initializer
inbrew.conv
argumentAll changes are below https://github.com/rioyokotalab/models/commit/cc5f9a90a828fac4ad2b3eb403c42a4a24d42f6d
Execution
For intra-node parallel learning on a machine with four P100s, the following command is executed
Error
Machine environment