zhou745 / GauFuse_WSTAL

20 stars 3 forks source link

Abnormal training results #7

Closed eeehco closed 11 months ago

eeehco commented 11 months ago

Hello author, I encountered some issues while training Thumos14. The accuracy has steadily improved in the first 200 epochs, and after 200 epochs, the accuracy has all become 0, and the loss has changed to nan, and the learning rate has changed to 0. I configured it using the original code without any changes except for the location of the data. May I ask why this situation occurs? Is the learning rate or batch size too high?

Lcode119 commented 11 months ago

I also encountered this problem. Pseudo labels began to be used at the 201st epoch. Because when generating pseudo labels, linear programming will be used to find the optimal solution. However, for the data with idx 123, an error will be reported and no solution will be reported when using this strategy. Why is this? The error message is as follows: The problem is infeasible. (HiGHS Status 8: model_status is Infeasible; primal_status is At lower/fixed bound). Ask the author for help. image

zhou745 commented 11 months ago

I recall someone else encountering this error. it turns out this has something to do with scipy and numpy version. Some versions cause the linpro process resulting in nan value.

Lcode119 commented 11 months ago

Thank you, after changing the relevant version, the problem was solved!

eeehco commented 11 months ago

谢谢,更改相关版本后,问题解决了!

Hello, may I ask what your version has changed to? Can you tell me that I haven't solved this problem yet

eeehco commented 11 months ago

Thank you, after changing the relevant version, the problem was solved!

Hello, I tried changing the version, but it still doesn't work. Can you tell me which version you are using? Thank you.

zhou745 commented 11 months ago

Thank you, after changing the relevant version, the problem was solved!

Hello, I tried changing the version, but it still doesn't work. Can you tell me which version you are using? Thank you.

I used torch 1.11.0 numpy 1.22.4 scikit-learn 1.1.2 scipy 1.8.1 with 3090 graphic card.

eeehco commented 11 months ago

Thank you, after changing the relevant version, the problem was solved!

Hello, I tried changing the version, but it still doesn't work. Can you tell me which version you are using? Thank you.

I used torch 1.11.0 numpy 1.22.4 scikit-learn 1.1.2 scipy 1.8.1 with 3090 graphic card.

Thank you!I have tried the same version combination as you, but the error 'import error: cannot import name' interp1d 'from' scipy. interpolate 'will be reported. So in order for the code to run properly, I switched to the numpy version, but after switching to the numpy version, after training for 200 rounds, Nan appeared again.

eeehco commented 11 months ago

Thank you, after changing the relevant version, the problem was solved!

Hello, I tried changing the version, but it still doesn't work. Can you tell me which version you are using? Thank you.

I used torch 1.11.0 numpy 1.22.4 scikit-learn 1.1.2 scipy 1.8.1 with 3090 graphic card.

Thank you!I have tried the same version combination as you, but the error 'import error: cannot import name' interp1d 'from' scipy. interpolate 'will be reported. So in order for the code to run properly, I switched to the numpy version, but after switching to the numpy version, after training for 200 rounds, Nan appeared again.

‘torch1.11.0 + numpy1.22.4 + scikit-learn 1.1.2 + spicy 1.7.0’ is right