Closed 77281900000 closed 5 years ago
Thanks for reporting. We will look into it.
Hi @gaodihe , could you let us know your versions of python, numpy, and pytorch? That could help us identify the problem.
Tried a few times. Cannot replicate the problem yet :(
Hi, I'm using python 3.5.2,numpy1.15.1,pytorch0.4.0
At 2018-12-21 23:20:55, "Quan Wang" notifications@github.com wrote:
Hi @gaodihe , could you let us know your versions of python, numpy, and pytorch? That could help us identify the problem.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Acturally I cannot replicate it neither.After tried a few times,the program could run correctly with accuracy 1.The problem seems disappear,but run_test.sh problem still exist.And this run_test could run correctly in another file path(I think there is no different except some data) .
At 2018-12-21 23:46:27, "Quan Wang" notifications@github.com wrote:
Tried a few times. Cannot replicate the problem yet :(
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@gaodihe Thanks for your information!
I just created a new issue: https://github.com/google/uis-rnn/issues/16
Once this is done, I may need your help to re-run the tests with a high verbosity value and share the loggings with us.
Currently we don't have sufficient information to debug this.
@gaodihe Actually, even before I resolve that bug, could you share with me the full STDOUT information of your failing test?
Ran 4 tests in 0.001s
Traceback (most recent call last): File "tests/integration_test.py", line 99, in test_four_clusters self.assertEqual(1.0, accuracy) AssertionError: 1.0 != 0.9
Ran 1 test in 15.784s
Ran 4 tests in 0.001s
Ran 4 tests in 0.001s
Traceback (most recent call last): File "tests/integration_test.py", line 99, in test_four_clusters self.assertEqual(1.0, accuracy) AssertionError: 1.0 != 0.9
Ran 1 test in 16.022s
My initial guess is that the network simply didn't converge to a good point at the end of training.
0.9 is still a high accuracy, though we were expecting 1.0.
A few things to try to validate this:
integration_test.py
, change training_args.train_iteration = 200
to a larger value like 300, to see if you get accuracy = 1.0.integration_test.py
the setUp()
function, change the values of the random seeds, to see if the accuracy becomes 1.0.In general this issue could be avoided by training multiple networks in parallel and pick the best one.
But there is also space for us to improve the training process and the default arguments to make the training more robust and efficient.
@AnzCol Please take a look at this to see if you have any thoughts.
Yes,I have tried your reslosutions.Both of them could solve this problem.Thanks for your help.
@gaodihe Thanks for trying it. It's very helpful!
It basically validated that the failure is due to unsuccessful training.
In practice we usually have much more training steps than the unit/integration tests. The purpose of the tests is to validate code correctness, and we often run it after a small code change, so we prefer to use less steps to make it fast instead of stable.
Hi, I run this demo.py twice. The first time it works well,and its accurancy is 1.But when i delete the model and try again it works but its result is only about 0.8.I'm sure i didn't change the program.I have tried to deleted the whole program and git clone it again. Its result is still about 0.8.Then I run run_test.sh and got a error,
====================================================================== FAIL: test_four_clusters (main.TestIntegration) Four clusters on vertices of a square.
Traceback (most recent call last): File "tests/integration_test.py", line 99, in test_four_clusters self.assertEqual(1.0, accuracy) AssertionError: 1.0 != 0.9
Ran 1 test in 17.543s
FAILED (failures=1)
There must be something strange happens.Could anyone tell me why could lead to this happen? Thanks.