open-mmlab / mmskeleton

A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.
Apache License 2.0
2.94k stars 1.04k forks source link

Accuracy Issue #311

Open gravesprite opened 4 years ago

gravesprite commented 4 years ago

Hi, thank you for your code sharing. I was just trying to train out a model using NTU-RGB dataset, and I exactly follow the xview/train.yaml to train that. However, the loss just did not go down at all. Really need some help, could anyone give me some advice?

INFO:mmcv.runner.runner:Epoch(train) [75][295] loss: 4.1075, top1: 0.0166, top5: 0.0872 INFO:mmcv.runner.runner:Epoch [76][100/588] lr: 0.00100, eta: 0:08:05, time: 0.136, data_time: 0.015, memory: 6742, loss: 4.1307 INFO:mmcv.runner.runner:Epoch [76][200/588] lr: 0.00100, eta: 0:07:47, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1336 INFO:mmcv.runner.runner:Epoch [76][300/588] lr: 0.00100, eta: 0:07:30, time: 0.127, data_time: 0.004, memory: 6742, loss: 4.1310 INFO:mmcv.runner.runner:Epoch [76][400/588] lr: 0.00100, eta: 0:07:13, time: 0.128, data_time: 0.004, memory: 6742, loss: 4.1332 INFO:mmcv.runner.runner:Epoch [76][500/588] lr: 0.00100, eta: 0:06:55, time: 0.128, data_time: 0.004, memory: 6742, loss: 4.1295 INFO:mmcv.runner.runner:Epoch [77][100/588] lr: 0.00100, eta: 0:06:22, time: 0.138, data_time: 0.015, memory: 6742, loss: 4.1288 INFO:mmcv.runner.runner:Epoch [77][200/588] lr: 0.00100, eta: 0:06:05, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1325 INFO:mmcv.runner.runner:Epoch [77][300/588] lr: 0.00100, eta: 0:05:48, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1337 INFO:mmcv.runner.runner:Epoch [77][400/588] lr: 0.00100, eta: 0:05:31, time: 0.126, data_time: 0.004, memory: 6742, loss: 4.1360 INFO:mmcv.runner.runner:Epoch [77][500/588] lr: 0.00100, eta: 0:05:14, time: 0.126, data_time: 0.004, memory: 6742, loss: 4.1318 INFO:mmcv.runner.runner:Epoch [78][100/588] lr: 0.00100, eta: 0:04:41, time: 0.138, data_time: 0.015, memory: 6742, loss: 4.1277 INFO:mmcv.runner.runner:Epoch [78][200/588] lr: 0.00100, eta: 0:04:24, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1274 INFO:mmcv.runner.runner:Epoch [78][300/588] lr: 0.00100, eta: 0:04:07, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1361 INFO:mmcv.runner.runner:Epoch [78][400/588] lr: 0.00100, eta: 0:03:50, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1322 INFO:mmcv.runner.runner:Epoch [78][500/588] lr: 0.00100, eta: 0:03:33, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1356 INFO:mmcv.runner.runner:Epoch [79][100/588] lr: 0.00100, eta: 0:03:01, time: 0.137, data_time: 0.015, memory: 6742, loss: 4.1330 INFO:mmcv.runner.runner:Epoch [79][200/588] lr: 0.00100, eta: 0:02:44, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1324 INFO:mmcv.runner.runner:Epoch [79][300/588] lr: 0.00100, eta: 0:02:27, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1322 INFO:mmcv.runner.runner:Epoch [79][400/588] lr: 0.00100, eta: 0:02:10, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1268 INFO:mmcv.runner.runner:Epoch [79][500/588] lr: 0.00100, eta: 0:01:53, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1339 INFO:mmcv.runner.runner:Epoch [80][100/588] lr: 0.00100, eta: 0:01:21, time: 0.137, data_time: 0.015, memory: 6742, loss: 4.1321 INFO:mmcv.runner.runner:Epoch [80][200/588] lr: 0.00100, eta: 0:01:05, time: 0.125, data_time: 0.003, memory: 6742, loss: 4.1380 INFO:mmcv.runner.runner:Epoch [80][300/588] lr: 0.00100, eta: 0:00:48, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1323 INFO:mmcv.runner.runner:Epoch [80][400/588] lr: 0.00100, eta: 0:00:31, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1301 INFO:mmcv.runner.runner:Epoch [80][500/588] lr: 0.00100, eta: 0:00:14, time: 0.126, data_time: 0.003, memory: 6742, loss: 4.1315 INFO:mmcv.runner.runner:Epoch(train) [80][295] loss: 4.1073, top1: 0.0165, top5: 0.0875

yosagaf commented 4 years ago

Did you find a solution to your problem ?

gravesprite commented 4 years ago

Did you find a solution to your problem ?

No, the issue still exists.

xiaoyang-coder commented 4 years ago

Did you find a solution to your problem ?

vivek87799 commented 4 years ago

I have exactly the same problem for NTU-RGB-xsub dataset. Does anyone have a solution for it ?

vivek87799 commented 4 years ago

training_hooks: 
    lr_config: 
      policy: 'step' 
      step: [20, 30, 40, 50] 
    log_config: 
      interval: 100 
      hooks: 
        - type: TextLoggerHook 
    checkpoint_config: 
      interval: 5 
    optimizer_config: 
      grad_clip:
xiaoyang-coder commented 4 years ago

@vivek87799 thank you

fnxiang commented 4 years ago

I have the same problem on training Kinetics data.Because I just have one gpu, I set gpus:1 and batch_size:128. I don't know how to set lr or other parameters.The loss never converges.

YeTaoY commented 4 years ago

I have the same problem on training Kinetics data. loss is about 6, and can not converge

YeTaoY commented 4 years ago

Hi, guys , what @vivek87799 has done fixed my problem.

happysheep224 commented 4 years ago

Hi, guys , what @vivek87799 has done fixed my problem.

which file should I modify? train.yaml?

xiaoyang-coder commented 4 years ago

I didn't try his method,But it should be this document(train.yaml)

------------------ 原始邮件 ------------------ 发件人: "happysheep224"<notifications@github.com>; 发送时间: 2020年7月17日(星期五) 中午11:27 收件人: "open-mmlab/mmskeleton"<mmskeleton@noreply.github.com>; 抄送: "1771203081"<1771203081@qq.com>; "Comment"<comment@noreply.github.com>; 主题: Re: [open-mmlab/mmskeleton] Accuracy Issue (#311)

Hi, guys , what @vivek87799 has done fixed my problem.

which file should I modify? train.yaml?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

happysheep224 commented 4 years ago

@YeTaoY how this method work? I try this way , but it not work . Do you have more tips ?

paleomoon commented 4 years ago

Just add grad_clip in train.yaml as vivek87799 said, now loss is decreasing. But I don't understand how this works.