Closed gbdjxgp closed 8 months ago
下面是日志文件 20240327_220346.log
You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.
You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.
非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log
3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log
请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?
Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,
Adjusting the learning rate is an effective way, and you can also increase the number of training epochs appropriately. I have never changed the warmup, and I am not sure whether it will have a significant impact on performance.
@gbdjxgp 您好,请问一下您对dotav1.0的分割,采用的是作者提供的代码吗?您的分割日志能发一下吗?我想参考一下,谢谢您这是我的日志 20240409_232907.log
@gbdjxgp 您好,请问一下您对dotav1.0的分割,采用的是作者提供的代码吗?您的分割日志能发一下吗?我想参考一下,谢谢您这是我的日志 20240409_232907.log
直接采用文档进行分割就行,我没找到分割文件 分割代码
You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.
非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log
3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log
请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?
Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,
- [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
- Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log )
- Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log )
- Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ) May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?
@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made
You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.
非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log 3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log 请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢? Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,
- [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
- Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log )
- Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log )
- Use 1 GPU base lr/2 (0.0001), mAP=NaN [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ) May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?
@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made
Hello, I did not conduct further experiments on multi-scale datasets. I only conducted experiments on single-scale datasets. It is recommended to adjust the learning rate of a single GPU to 0.00005, and the result of a single-scale dataset may be around 0.755
Branch
master branch https://mmrotate.readthedocs.io/en/latest/
📚 The doc issue
作者您好,我正在基于单个GPU复现您的代码,按照您文档中的说明,采用多尺度训练,仅读取预训练的主干,配置文件使用LSKNet_T,将syncBN改成BN,学习率从原来的0.0002改成0.0002/8,下面附件里是训练过程中的日志文件,从实验结果中看到精度差异较大,并不能达到您日志中的0.852,请问是我超参数设置的有问题吗,期待您的回复。 Hello author, I am replicating your code based on a single GPU. Following the instructions in your document, we are using multi-scale training to only read the pre trained backbone. The configuration file uses LSKNet-T, changing syncBN to BN, and changing the learning rate from 0.0002 to 0.0002/8. Attached is the log file from the training process. From the experimental results, we can see that there is a significant difference in accuracy, which cannot reach the 0.852 in your log. May I ask if there is an issue with my hyperparameter settings? Looking forward to your reply.
Suggest a potential alternative/fix
No response