megvii-research / FSCE

Apache License 2.0
275 stars 47 forks source link

Training Error #12

Closed AmingWu closed 3 years ago

AmingWu commented 3 years ago

Dear authors,

When I run your code, there is an error. The version of pytorch is 1.4.0 + 0.5.0 torchvision. Could you give me some advice? Thank you. QQ截图20210326190809

Chauncy-Cai commented 3 years ago

It seems like you don't successfully “build". Plz make sure you have installed all the requirement and successfully executed "python setup.py build develop".

AmingWu commented 3 years ago

Thanks for your reply. I have successfully run your code. However, I do not reproduce your result.

For the 10-shot in Split1, the nAP50 value is 61, which is lower than your report result.

QQ截图20210328184724

john2020-210 commented 3 years ago

@Chauncy-Cai @AmingWu How do you execute the "python setup.py build develop".when I do this ,it report the error "can not find the configs in the model_zoo"

bsun0802 commented 3 years ago

@Chauncy-Cai @AmingWu How do you execute the "python setup.py build develop".when I do this ,it report the error "can not find the configs in the model_zoo"

@Chauncy-Cai Can you please address this?

bsun0802 commented 3 years ago

@Chauncy-Cai @AmingWu How do you execute the "python setup.py build develop".when I do this ,it report the error "can not find the configs in the model_zoo"

@Chauncy-Cai Can you please address this?

Thanks for your reply. I have successfully run your code. However, I do not reproduce your result.

For the 10-shot in Split1, the nAP50 value is 61, which is lower than your report result.

QQ截图20210328184724

@AmingWu I don't know why, I seems to produce nAP > 62.7 easily. @Chauncy-Cai 你们还在搞 few-shot 么, 跑一下 VOC 试试, 我觉得我那个 yml 应该是63的点才对啊.我现在没法跑, 离职了没有8卡.

Chauncy-Cai commented 3 years ago

Thanks for your reply. I have successfully run your code. However, I do not reproduce your result.

For the 10-shot in Split1, the nAP50 value is 61, which is lower than your report result.

QQ截图20210328184724

I rerun this yaml(10shot split1) last weekend, I do easily reach 62.5+ in nAP50.

The few-shot performance are not stable due to its nature.

Chauncy-Cai commented 3 years ago

@Chauncy-Cai @AmingWu How do you execute the "python setup.py build develop".when I do this ,it report the error "can not find the configs in the model_zoo"

You can make a soft link to "FSCE/configs/", this may work.

john2020-210 commented 3 years ago

@AmingWu I'm sorry to bother you,I would like to ask about your hardware configuration. I have been reporting errors when running the code.I also use the version of pytorch is 1.4.0 + 0.5.0 torchvision,the graphics card is RTX2080Ti. Looking forward to hearing from you.

AmingWu commented 3 years ago

@john2020-210 , my hardward configuration is the same to you. The version of pytorch is 1.4.0 + 0.5.0 torchvision,the graphics card is RTX2080Ti.

john2020-210 commented 3 years ago

Thanks for you reply ,Does it have anything to do with the system or it has to do with the installation process,my system is ubuntu20.04 .what is yours,and any other matters need attention about installation process?

john2020-210 commented 3 years ago

@AmingWu I'm sorry to bother you,.I have successfully run your code. However, I have only 1 GPU on my computer,

For the 10-shot in Split1, the nAP50 value is 57, which is lower than yours ,Could you share your parameter configuration ?

bsun0802 commented 3 years ago

@john2020-210 The results can be reproduced with 8-GPUs with the config we provide only.

For other then 8-GPUs, we don't know if it can be reproduce or need different configuration. For example, at least you can refer to learning rate scaling rule (https://stackoverflow.com/questions/53033556/how-should-the-learning-rate-change-as-the-batch-size-change/53046624)

Cuzny commented 3 years ago

As "Total number of RoIs per training minibatch = ROI_HEADS.BATCH_SIZE_PER_IMAGE * SOLVER.IMS_PER_BATCH", what will happen if I use only 1 GPU but still use the same parameters(BATCH_SIZE_PER_IMAGE and IMS_PER_BATCH) as 8-GPUs'?

bsun0802 commented 3 years ago

@Cuzny You can refer to https://github.com/MegviiDetection/FSCE/blob/main/fsdet/config/defaults.py for detailed explanation on the behavior of each config parameters.

Cuzny commented 3 years ago

@bsun0802 The size of minibatch seems to be 8 times larger?

bsun0802 commented 3 years ago

@Cuzny I guess, yes, if the GPU memory is enough. But I have no experience training fsdet with 1 GPU and don't know.

Cuzny commented 3 years ago

@bsun0802 Maybe I should try to reduce the learning rate to about 1/3. Thanks for your reply.