Closed Hiroki11x closed 7 years ago
$ PYTHONPATH=/usr/local python caffe2/python/examples/resnet50_trainer.py --train_data /path/to/ilsvrc12_train_lmdb
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:file_store_handler_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:redis_store_handler_ops as it is not a valid file.
INFO:resnet50_trainer:Running on GPUs: [0]
INFO:resnet50_trainer:Using epoch size: 1500000
Traceback (most recent call last):
File "caffe2/python/examples/resnet50_trainer.py", line 462, in <module>
main()
File "caffe2/python/examples/resnet50_trainer.py", line 458, in main
Train(args)
File "caffe2/python/examples/resnet50_trainer.py", line 301, in Train
data_parallel_model.Parallelize(
AttributeError: 'module' object has no attribute 'Parallelize'
$ # 自分でビルドした(runtime時にcudnn5を見に行ってたので apt-get autoremove libcudnn5 した)
$ PYTHONPATH=. python caffe2/python/examples/resnet50_trainer.py --train_data /path/to/ilsvrc12_train_lmdb
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:file_store_handler_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:redis_store_handler_ops as it is not a valid file.
INFO:resnet50_trainer:Running on GPUs: [0]
INFO:resnet50_trainer:Using epoch size: 1500000
INFO:data_parallel_model:Parallelizing model for devices: [0]
INFO:data_parallel_model:Create input and model training operators
INFO:data_parallel_model:Model for GPU : 0
INFO:data_parallel_model:Adding gradient operators
INFO:data_parallel_model:Add gradient all-reduces for SyncSGD
INFO:data_parallel_model:Post-iteration operators for updating params
INFO:data_parallel_model:Calling optimizer builder function
INFO:data_parallel_model:Add initial parameter sync
WARNING:data_parallel_model:------- DEPRECATED API, please use data_parallel_model.OptimizeGradientMemory() -----
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Remapping 111 blobs, using 14 shared
INFO:memonger:Memonger memory optimization took 0.0161368846893 secs
INFO:resnet50_trainer:Starting epoch 0/1000
INFO:resnet50_trainer:Finished iteration 1/46875 of epoch 0 (25.48 images/sec)
INFO:resnet50_trainer:Training loss: 7.22864294052, accuracy: 0.0
INFO:resnet50_trainer:Finished iteration 2/46875 of epoch 0 (107.79 images/sec)
INFO:resnet50_trainer:Training loss: 21.5477619171, accuracy: 0.0
INFO:resnet50_trainer:Finished iteration 3/46875 of epoch 0 (112.01 images/sec)
INFO:resnet50_trainer:Training loss: 17.5498409271, accuracy: 0.0
INFO:resnet50_trainer:Finished iteration 4/46875 of epoch 0 (112.40 images/sec)
INFO:resnet50_trainer:Training loss: 25.3153591156, accuracy: 0.0
(略)
$ nvidia-docker run -it --rm -v /path/to/ilsvrc12_train_lmdb:/data caffe2ai/caffe2 python caffe2/python/examples/resnet50_trainer.py --train_data /data
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:file_store_handler_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:redis_store_handler_ops as it is not a valid file.
INFO:resnet50_trainer:Running on GPUs: [0]
INFO:resnet50_trainer:Using epoch size: 1500000
INFO:data_parallel_model:Parallelizing model for devices: [0]
INFO:data_parallel_model:Create input and model training operators
INFO:data_parallel_model:Model for GPU: 0
INFO:data_parallel_model:Adding gradient operators
INFO:data_parallel_model:Add gradient all-reduces for SyncSGD
INFO:data_parallel_model:Post-iteration operators for updating params
INFO:data_parallel_model:Add initial parameter sync
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Remapping 128 blobs, using 19 shared
INFO:memonger:Memonger optimization took 0.00878810882568 secs
INFO:resnet50_trainer:Starting epoch 0/1000
INFO:resnet50_trainer:Finished iteration 1/46875 of epoch 0 (7.29 images/sec)
INFO:resnet50_trainer:Finished iteration 2/46875 of epoch 0 (119.08 images/sec)
INFO:resnet50_trainer:Finished iteration 3/46875 of epoch 0 (120.80 images/sec)
INFO:resnet50_trainer:Finished iteration 4/46875 of epoch 0 (121.48 images/sec)
(略)
Dockerのはv0.6.0なので少し古いです
最初はprotobuf
のバージョンのせいかと思ったのですが、違ったのでよくわからないです
@sekiya-a 解凍していただいてたんですね、気づかずすみません。 最新にupdateしたらbuildできました