Closed MSutt closed 5 months ago
Hi,I,m a new ssd learner. and now i am training my own data only for detecting person.but i got some errors when testing.maybe i need to try to finetune the model,but i cannot find the finetune_ssd_pascal_512.py file ,so i wanna ask you where is it!Thank you very much!
You can find finetune_ssd_pascal_512.py
in the 07++12+COCO model folder linked at the end of README.md.
You could set debug_info to true in the script and check which layer takes long time. It could be the anno data layer as well
K80 is slow as well. And you are only using one GPU, which could also be the reason. You could reduce the batch size. But you have to tune the lr to get comparable performance
Thanks for the answer, i will activate debug info to see which layer takes long time. Can you give some example of your training speed ?
After activating debug infos, here is what's happening during 10 iterations of my training.
As previously, my 10 iterations are during about 2 min. The first step reading data take almost all of this time.
The second step taking time is the Layer mbox_priorbox, top blob
which take about 6 seconds. That's not big deal compared to the 2 min of reading datas.
I0519 11:00:58.664758 7715 solver.cpp:243] Iteration 12000, loss = 0.69167
I0519 11:00:58.664795 7715 solver.cpp:259] Train net output #0: mbox_loss = 0.977279 (* 1 = 0.977279 loss)
I0519 11:00:58.664825 7715 sgd_solver.cpp:138] Iteration 12000, lr = 0.001
I0519 11:03:00.699906 7715 net.cpp:608] [Forward] Layer data, top blob data data: 40.1579
I0519 11:03:00.700281 7715 net.cpp:608] [Forward] Layer data, top blob label data: 3.86338
...
...
...
I0519 11:03:02.182437 7715 net.cpp:608] [Forward] Layer mbox_conf, top blob mbox_conf data: 2.22728
I0519 11:03:02.185241 7715 net.cpp:608] [Forward] Layer mbox_priorbox, top blob mbox_priorbox data: 0.326064
I0519 11:03:08.926136 7715 net.cpp:608] [Forward] Layer mbox_loss, top blob mbox_loss data: 0.66974
I0519 11:03:09.444665 7715 net.cpp:636] [Backward] Layer mbox_loss, bottom blob mbox_loc diff: 1.10698e-06
...
...
...
I0519 11:03:12.308850 7715 solver.cpp:243] Iteration 12010, loss = 0.479683
I0519 11:03:12.308941 7715 solver.cpp:259] Train net output #0: mbox_loss = 0.66974 (* 1 = 0.66974 loss)
I0519 11:03:12.309013 7715 sgd_solver.cpp:138] Iteration 12010, lr = 0.001
Is the loading time normal ? can i reduce it ?
here is my data layer (i removed some batch samplers to train just on the original images)
layer {
name: "data"
type: "AnnotatedData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
resize_param {
prob: 1.0
resize_mode: FIT_LARGE_SIZE_AND_PAD
height: 512
width: 512
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
}
emit_constraint {
emit_type: CENTER
}
distort_param {
brightness_prob: 0.5
brightness_delta: 32.0
contrast_prob: 0.5
contrast_lower: 0.5
contrast_upper: 1.5
hue_prob: 0.5
hue_delta: 18.0
saturation_prob: 0.5
saturation_lower: 0.5
saturation_upper: 1.5
random_order_prob: 0.0
}
expand_param {
prob: 0.5
max_expand_ratio: 4.0
}
}
data_param {
source: "examples/mydataset/mydataset_train_lmdb"
batch_size: 8
backend: LMDB
}
annotated_data_param {
batch_sampler {
max_sample: 1
max_trials: 1
}
label_map_file: "data/mydataset/labelmap_voc.prototxt"
}
}
@weiliu89 Yeah, my training process is too slow too. It can only run about 14000 epochs for one day. Is there any solution? Thank you!
@mxmxlwlw hi ,I'm currently training my customer dataset on GTX1060, It takes about 1 min to train 10 iterations. what's ur GPU devices?
@IEEE-FELLOW Hahaha, you are lucy! My GPU is GTX1080. You just need to annotate the expand_param in the prototxt of net. If your training image is big, it will slow you down!
@mxmxlwlw wow! I just change the code as u say, and it takes 14 s to train 10 iterations! currently the loss is around 3. my training image is 1920*1080 ,and the detection target is much small, do u have any idea how to detect small target? many thanks!
@IEEE-FELLOW For detect small target you needs to predict the box before too many pooling was apply to the feature. And the input size should be as big as you can. You can use kernel of stride 4 and size 7 in the first conv layer to reduce the computation.
@mxmxlwlw thanks,I'll have a try.
First of all, thanks for this amazing work.
Issue summary
I am training an SSD 512 on my own dataset, training time between 10 iterations is about 2 minutes as shown next. I consider that this is slow compared to other network training, is it a normal speed or is there something wrong? (i am training on a Tesla K80) How speed should be training on the different models (300 / 512) and on different configurations (one / multiple GPU) ?
Steps to reproduce
Datas
I create my lmdb using
create_data.sh
withmax_dim=512
cause i want my images to keep their aspect ratio.Network
To train the network i used the
finetune_ssd_pascal_512.py
file coming from the 07++12+COCO model. I removed'mirror': True,
and'mean_value': [104, 117, 123],
from both train and test transform param. I also changed the resize_mode toresize_mode: FIT_LARGE_SIZE_AND_PAD,
. Of course i adapted all paths to my dataset and model, changed number of classe, changedgpus = "0,1,2,3"
togpus = "0"
cause i have only one GPU and modified the base LR.Your system configuration
I am training on a GoogleCompute server with a Tesla K80.
Operating system: Ubuntu 14.04 Compiler: 4.8.4 CUDA version : release 7.0, V7.0.27 CUDNN version : 4.0.7 BLAS: atlas Python version : 2.7