Do you have a log file for the train_end2end.py? Should I change the lr or other parameters when I use multi-gpus?

jshtok / RepMet

Few-shot detection for visual categories

Apache License 2.0

110 stars 18 forks source link

Do you have a log file for the train_end2end.py? Should I change the lr or other parameters when I use multi-gpus? #16

Closed yang-yk closed 4 years ago

yang-yk commented 4 years ago

Hello, jshtok. When I try to reproduce the work, the model seems to learn well. However, the test results are not as good as what the paper reported. So I want to know if you have a training log file. Besides, I want to know whether I should change some parameters like lr when I use multi-gpus.

Hope to hear from you soon! Sincerely, Yukuan Yang

duynn912 commented 4 years ago

Hi @yang-yk, How many percentage of accuracy did you receive? I also reproduce the work, I train on 2 GPU and get 77.4% when testing with 1 shot 5 way 500 episodes for pascal and inloc. It seems the code just writes the log and does not have any results to save like pkl.

yang-yk commented 4 years ago

Hi @yang-yk, How many percentage of accuracy did you receive? I also reproduce the work, I train on 2 GPU and get 77.4% when testing with 1 shot 5 way 500 episodes for pascal and inloc. It seems the code just writes the log and does not have any results to save like pkl.

77.4%？？？ Do you train the model on the imagenet-loc dataset ? Have you changed the parameter settings? How many epoches do you train? Your experiment result seems to be even better than what the paper reported. I have found some errors of the code, which may affect the accuracy. But I don't think I can get so high mAP. What's more, how long does model training take for one time. I trained the model with 4 GPUs and it almost takes 80 hours.

duynn912 commented 4 years ago

Hi @yang-yk, Oh I am sorry, I just got an wrong number of percentage. So 77.4 is just the number in the logger appear on the command screen. I do not know how to output accuracy separately for pascal and inloc. Do you know it? Btw, I only train my model on 2 GPUs, and it takes me 4 day to train.

yang-yk commented 4 years ago

Hi @yang-yk, Oh I am sorry, I just got an wrong number of percentage. So 77.4 is just the number in the logger appear on the command screen. I do not know how to output accuracy separately for pascal and inloc. Do you know it? Btw, I only train my model on 2 GPUs, and it takes me 4 day to train.

You can run the few_shot_benchmark.py for test and this file can print the accuracy. They just test the model on the imagenet-loc dataset without testing it on the pascal voc dataset. You can replace the model in the path './data/image_loc/xx.params' with your own model to test. By the way, remember to change the load_a_model function in few_shot_benchmark.py.

duynn912 commented 4 years ago

Yup, I had to replace that model by my trained model which is trained with pascal and inloc and run few_shot_benchmark.py. 77.4% is a result of 1 shot 5 way 500 episodes for that. I do not know this result is for pascal or inloc. By the way, do you know how to save detection results by pickle files like in Detectron or something like this?

yang-yk commented 4 years ago

Yup, I had to replace that model by my trained model which is trained with pascal and inloc and run few_shot_benchmark.py. 77.4% is a result of 1 shot 5 way 500 episodes for that. I do not know this result is for pascal or inloc. By the way, do you know how to save detection results by pickle files like in Detectron or something like this?

Do you have a log file for the training process? I wonder if you can provide me with one to see whether the training process is correct. By the way, do you change the lr_iter in the train_end2end.py. There is something wrong here.

duynn912 commented 4 years ago

Here is my log file resnet_v1_101_voc0712_trainval_fpn_dcn_oneshot_end2end_ohem_8_2019-12-04-19-27.log and I change lr=0.005 I am sorry this result 77.4% is also in training data. I am running again on test data 214 cls.

jshtok commented 4 years ago

Hi, I have reconstructed the .yaml file that has produce the model with reported performance; please see the experiments/cfgs/resnet_v1_101_voc0712_trainval_fpn_dcn_oneshot_end2end_ohem_8_orig.yaml