[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset

gbdjxgp commented 8 months ago

Branch

master branch https://mmrotate.readthedocs.io/en/latest/

📚 The doc issue

作者您好，我正在基于单个GPU复现您的代码，按照您文档中的说明，采用多尺度训练，仅读取预训练的主干，配置文件使用LSKNet_T，将syncBN改成BN，学习率从原来的0.0002改成0.0002/8，下面附件里是训练过程中的日志文件，从实验结果中看到精度差异较大，并不能达到您日志中的0.852，请问是我超参数设置的有问题吗，期待您的回复。 Hello author, I am replicating your code based on a single GPU. Following the instructions in your document, we are using multi-scale training to only read the pre trained backbone. The configuration file uses LSKNet-T, changing syncBN to BN, and changing the learning rate from 0.0002 to 0.0002/8. Attached is the log file from the training process. From the experimental results, we can see that there is a significant difference in accuracy, which cannot reach the 0.852 in your log. May I ask if there is an issue with my hyperparameter settings? Looking forward to your reply.

Suggest a potential alternative/fix

No response

gbdjxgp commented 8 months ago

下面是日志文件 20240327_220346.log

zcablii commented 8 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

gbdjxgp commented 8 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复，我按照您的思路稍微调大了一点学习率，由于我们实验设备有限，只在多尺度数据集上训练了前三个epoch，并且在第三个epoch进行验证集精度对比， 1.公开的训练日志中8个GPU，mAP=0.766 2.使用1个GPU基础lr/8（0.000025），mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4（0.00005），mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2（0.0001）,mAP=NaN 20240328_223842.log

请问作者，我增大学习率，在第三个epoch精度有一定的上涨，那么增大学习率这个方向是对的吗，如果为了尽量复现原论文中的精度指标，我又需要做什么样的调整呢？我看到您日志文件中，预热为500个iter，这是否意味着我使用1个GPU需要预热4000个iteration呢？

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

[Public training logs]（ https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log ）8 GPUs in the middle, mAP=0.766
Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]（ https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log ）
Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]（ https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log ）
Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]（ https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ） May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

zcablii commented 8 months ago

Adjusting the learning rate is an effective way, and you can also increase the number of training epochs appropriately. I have never changed the warmup, and I am not sure whether it will have a significant impact on performance.

yyq0828 commented 7 months ago

@gbdjxgp 您好，请问一下您对dotav1.0的分割，采用的是作者提供的代码吗？您的分割日志能发一下吗？我想参考一下，谢谢您这是我的日志 20240409_232907.log

gbdjxgp commented 7 months ago

@gbdjxgp 您好，请问一下您对dotav1.0的分割，采用的是作者提供的代码吗？您的分割日志能发一下吗？我想参考一下，谢谢您这是我的日志 20240409_232907.log

直接采用文档进行分割就行，我没找到分割文件分割代码

CrazyBrick commented 6 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复，我按照您的思路稍微调大了一点学习率，由于我们实验设备有限，只在多尺度数据集上训练了前三个epoch，并且在第三个epoch进行验证集精度对比， 1.公开的训练日志中8个GPU，mAP=0.766 2.使用1个GPU基础lr/8（0.000025），mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4（0.00005），mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2（0.0001）,mAP=NaN 20240328_223842.log

请问作者，我增大学习率，在第三个epoch精度有一定的上涨，那么增大学习率这个方向是对的吗，如果为了尽量复现原论文中的精度指标，我又需要做什么样的调整呢？我看到您日志文件中，预热为500个iter，这是否意味着我使用1个GPU需要预热4000个iteration呢？

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

[Public training logs]（ https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log ）8 GPUs in the middle, mAP=0.766

Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]（ https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log ）

Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]（ https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log ）

Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]（ https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ） May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

gbdjxgp commented 6 months ago

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复，我按照您的思路稍微调大了一点学习率，由于我们实验设备有限，只在多尺度数据集上训练了前三个epoch，并且在第三个epoch进行验证集精度对比， 1.公开的训练日志中8个GPU，mAP=0.766 2.使用1个GPU基础lr/8（0.000025），mAP=0.73120240327_220346.log 3.使用1个GPU基础lr/4（0.00005），mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2（0.0001）,mAP=NaN 20240328_223842.log 请问作者，我增大学习率，在第三个epoch精度有一定的上涨，那么增大学习率这个方向是对的吗，如果为了尽量复现原论文中的精度指标，我又需要做什么样的调整呢？我看到您日志文件中，预热为500个iter，这是否意味着我使用1个GPU需要预热4000个iteration呢？ Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

[Public training logs]（ https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log ）8 GPUs in the middle, mAP=0.766

Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]（ https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log ）

Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]（ https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log ）

Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]（ https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ） May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

Hello, I did not conduct further experiments on multi-scale datasets. I only conducted experiments on single-scale datasets. It is recommended to adjust the learning rate of a single GPU to 0.00005, and the result of a single-scale dataset may be around 0.755

zcablii / LSKNet