zcablii / LSKNet

(IJCV2024 & ICCV2023) LSKNet: A Foundation Lightweight Backbone for Remote Sensing
Other
489 stars 40 forks source link

[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

Closed gbdjxgp closed 8 months ago

gbdjxgp commented 8 months ago

Branch

master branch https://mmrotate.readthedocs.io/en/latest/

📚 The doc issue

作者您好,我正在基于单个GPU复现您的代码,按照您文档中的说明,采用多尺度训练,仅读取预训练的主干,配置文件使用LSKNet_T,将syncBN改成BN,学习率从原来的0.0002改成0.0002/8,下面附件里是训练过程中的日志文件,从实验结果中看到精度差异较大,并不能达到您日志中的0.852,请问是我超参数设置的有问题吗,期待您的回复。 Hello author, I am replicating your code based on a single GPU. Following the instructions in your document, we are using multi-scale training to only read the pre trained backbone. The configuration file uses LSKNet-T, changing syncBN to BN, and changing the learning rate from 0.0002 to 0.0002/8. Attached is the log file from the training process. From the experimental results, we can see that there is a significant difference in accuracy, which cannot reach the 0.852 in your log. May I ask if there is an issue with my hyperparameter settings? Looking forward to your reply.

Suggest a potential alternative/fix

No response

gbdjxgp commented 8 months ago

下面是日志文件 20240327_220346.log

zcablii commented 8 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

gbdjxgp commented 8 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log

请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

  1. [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
  2. Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log
  3. Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log
  4. Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ) May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?
zcablii commented 8 months ago

Adjusting the learning rate is an effective way, and you can also increase the number of training epochs appropriately. I have never changed the warmup, and I am not sure whether it will have a significant impact on performance.

yyq0828 commented 7 months ago

@gbdjxgp 您好,请问一下您对dotav1.0的分割,采用的是作者提供的代码吗?您的分割日志能发一下吗?我想参考一下,谢谢您这是我的日志 20240409_232907.log

gbdjxgp commented 7 months ago

@gbdjxgp 您好,请问一下您对dotav1.0的分割,采用的是作者提供的代码吗?您的分割日志能发一下吗?我想参考一下,谢谢您这是我的日志 20240409_232907.log

直接采用文档进行分割就行,我没找到分割文件 分割代码

CrazyBrick commented 6 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log

请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

  1. [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
  2. Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log
  3. Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log
  4. Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ) May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

gbdjxgp commented 6 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log 3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log 请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢? Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

  1. [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
  2. Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log
  3. Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log
  4. Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ) May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

Hello, I did not conduct further experiments on multi-scale datasets. I only conducted experiments on single-scale datasets. It is recommended to adjust the learning rate of a single GPU to 0.00005, and the result of a single-scale dataset may be around 0.755