This branch hosts the code for the technical report "Towards Good Practices for Very Deep Two-stream ConvNets", and more.
prefetch()
method, which will be run in parallel with network processing.VideoDataLayer
for inputing video data.Generally it's the same as the original caffe. Please see the original README. Please see following instruction for accessing features above. More detailed documentation is on the way.
VideoDataLayer
has been added to support multi-frame input. See the UCF101 sample for how to use it.VideoDataLayer
can only input the optical-flow images generated by the tool listed above.fix_crop
to true
in tranform_param
of network's protocol buffer definition.multi_scale
to true
in transform_param
transform_param
, specify scale_ratios
as a list of floats smaller than one, default is [1, .875, .75, .65]
transform_param
, specify max_distort
to an integer, which will limit the aspect ratio distortion, default to 1
richness
which specifies the total GPU memory in MBs available to the cudnn convolution engine as workspaces. Default richness
is 300 (300MB). Using this parameter you can control the GPU memory consumption of training, the system will find the best setup under the memory limit for you.--with-cuda
device_id: [0,1,2,3]
mpirun
to launch caffe executable, like
mkdir build && cd build
cmake .. -DUSE_MPI=ON
make && make install
mpirun -np 4 ./install/bin/caffe train --solver=<Your Solver File> [--weights=<Pretrained caffemodel>]
Note: actual batch_size will be num_device
times batch_size
specified in network's prototxt.
Currently all existing data layers sub-classed from BasePrefetchingDataLayer
support parallel training. If you have newly added layer which is also sub-classed from BasePrefetchingDataLayer
, simply implement the virtual method
inline virtual void advance_cursor();
Its function should be forwarding the "data cursor" in your data layer for one step. Then your new layer will be able to provide support for parallel training.
Contact
You are encouraged to also cite one of the following papers if you find this repo helpful
@inproceedings{TSN2016ECCV,
author = {Limin Wang and
Yuanjun Xiong and
Zhe Wang and
Yu Qiao and
Dahua Lin and
Xiaoou Tang and
Luc {Val Gool}},
title = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
booktitle = {ECCV},
year = {2016},
}
@article{MultiGPUCaffe2015,
author = {Limin Wang and
Yuanjun Xiong and
Zhe Wang and
Yu Qiao},
title = {Towards Good Practices for Very Deep Two-Stream ConvNets},
journal = {CoRR},
volume = {abs/1507.02159},
year = {2015},
url = {http://arxiv.org/abs/1507.02159},
}
Following is the original README of Caffe.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.
Check out the project site for all the details like
and step-by-step examples.
Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.
Happy brewing!
Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}