official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
281 stars 130 forks source link

SPSA test completed with failed status #1070

Open ppigazzini opened 2 years ago

ppigazzini commented 2 years ago

https://tests.stockfishchess.org/tests/view/61604c5883dd501a05b0af65

Screenshot_20211009_212412

On discord the user reported:

not the first time I see a finished tuning is shown as failed. Is this a known bug?

vdbergh commented 2 years ago

It could be due to the test setting an option that does not exist. Then the worker will stop the test. However I guess there should be some audit trail for this (like in the case of an incorrect bench).

EDIT: Hmmm I am now seeing that already 800000 games have been played and that the test should be finished. So this seems unrelated. I guess we have to look at the possible reasons why a test can fail.

vdbergh commented 2 years ago

It seems run["failed"]=True can only be set by invoking the api api_stop_run. But in that case there should be a message in actiondb.

ppigazzini commented 2 years ago
It should be this (I checked "Actions" before opening the issue... I obviously missed it ...): Time Username Run/User Action
21-10-10 09:26:21 SFisGOD NtuneFT delete run
21-10-10 09:25:30 SFisGOD NtuneFT stop run: grandphish2-16cores-33d39163: Wrong bench in stockfish_f4a6136b68dd970dc1a12f5618f829781a26a137.exe Expected: 5717809 Got: 5565085
21-10-10 02:39:08 SFisGOD NtuneFT modify:
21-10-10 02:37:59 SFisGOD NtuneFT stop run: grandphish2-16cores-33d39163: Wrong bench in stockfish_f4a6136b68dd970dc1a12f5618f829781a26a137.exe Expected: 5717809 Got: 5565085
21-10-10 02:37:26 SFisGOD NtuneFT modify: priority changed from 0 to -10
21-10-10 02:36:52 SFisGOD NtuneFT modify:
21-10-09 16:41:57 SFisGOD NtuneFT modify: throughput changed from 200 to 100
21-10-09 07:47:46 SFisGOD NtuneFT modify: throughput changed from 100 to 200
21-10-09 07:46:27 SFisGOD NtuneFT modify: throughput changed from 400 to 100
21-10-08 18:40:01 SFisGOD NtuneFT modify: throughput changed from 200 to 400
21-10-08 14:16:49 SFisGOD NtuneFT modify: throughput changed from 100 to 200
21-10-08 13:49:12 SFisGOD NtuneFT new run
vdbergh commented 2 years ago

Hmmm. This raises more questions :frowning_face: . Why is the message there twice? And why did it happen after the run was finished?

EDIT. The two message are wide apart in time. So probably these were for different tests (but the same engine).

vdbergh commented 2 years ago

The last update for this test was received 2021-10-09 18:44:28. So this does not correspond to the two action messages....

ppigazzini commented 2 years ago

Hmmm. This raises more questions ☹️ . Why is the message there twice? And why did it happen after the run was finished?

EDIT. The two message are wide apart in time. So probably these were for different tests (but the same engine).

I think that the developers restarted the test with modify (at 21-10-10 02:39:08) after the first bench error. Perhaps the second bench error is happened when only some games were missing to complete the test.

We should stop a test only when N users report a bench error.

vdbergh commented 2 years ago

I did not know that it is possible to restart a finished test.

EDIT: I looked at the code for tests_modify. It seems it can create runs with run["finished"]=False which are logically finished (run["num_games"] has been completed). Such a run cannot hand out tasks, so it will stay in this state forever. Perhaps it eventually leads to a failed run?

vdbergh commented 2 years ago

No this is not possible since run["failed"]=True can only set by api_stop_run. Mystery.

ppigazzini commented 2 years ago

This is clear:

This is a mystery:

vdbergh commented 2 years ago

The last update for this test was received 2021-10-09 18:44:28. So this does not correspond to the two action messages...

I deleted the previous post since the explanation is incorrect. If the last update was received at 2021-10-09 18:44:28 the test must have been finished at that point. @ppigazzini Can you explain this?

vdbergh commented 2 years ago

What I am saying is that a finished test does not hand out tasks, even if run["finished"] happens to be False for some reason. So in my universe, the first api_stop_run cannot happen.

ppigazzini commented 2 years ago

Ok, not considered the timeline, so also the first api_stop_run is a mystery (if we are logging the time in the same timezone).

ppigazzini commented 2 years ago

I will be away from the keyboard for some hours, no time to check now. I'm wondering if that worker reported the error after a connection problem (if I recall correctly we are doubling a timeout somewhere).

ppigazzini commented 2 years ago

We are doubling over and over sleep(delay) for any exception raised in games(), eg Exception accessing host:

https://github.com/glinscott/fishtest/blob/278a946a191fed9e2d45f5258b0fd93c846dc382/worker/worker.py#L504-L511

https://github.com/glinscott/fishtest/blob/278a946a191fed9e2d45f5258b0fd93c846dc382/worker/worker.py#L229-L238

BTW we are doubling sleep(delay) also when the queue is empty...

Worker version 111 connecting to https://dfts-0.pigazzini.it:443
UUID: b28f79f3-358a-41d1-8435-5088839614ae
Start heartbeat
Fetch task...
API call rate limits: {'limit': 5000, 'used': 0, 'remaining': 5000, 'reset': 1633892427}
Worker version checked successfully in 0.144731s
Task requested in 0.125268s
No tasks available at this time, waiting...

delay=15.0
Fetch task...
API call rate limits: {'limit': 5000, 'used': 0, 'remaining': 5000, 'reset': 1633892450}
Worker version checked successfully in 0.299908s
Task requested in 0.126309s
No tasks available at this time, waiting...

delay=30.0
Fetch task...
API call rate limits: {'limit': 5000, 'used': 0, 'remaining': 5000, 'reset': 1633892485}
Worker version checked successfully in 0.124021s
Task requested in 0.12332s
No tasks available at this time, waiting...

  Send heartbeat for b28f79f3-358a-41d1-8435-5088839614ae ... Pleased to hear from you...
delay=60.0
vdbergh commented 2 years ago

Still I don't see how it can happen. If the worker gets an exception executing api_stop_run (e.g. because of a connection problem), I don't think it will retry the api call. So I don't understand how the api_stop_run could happen, many hours after the test was finished.

ppigazzini commented 2 years ago

The worker:

  1. downloads test src, test nn and compiles test
  2. downloads base src, base nn and compiles base
  3. download cutechess-cli
  4. download book
  5. check benches for test and base

For any connection exception in 3-4 the worker retries doubling the previous delay. After that the connection restarts and the big sleep(delay) finishes, the worker checks benches and calls api_stop_run.

This could explain the first stop action and the huge delay wrt the last update: I don't know if it's possible to set run["failed"]=True for an already completed run, though. I don't have idea about the second stop action from the same worker.

vdbergh commented 2 years ago

What you are saying seems possible, although it requires the collusion of two events, which seem already unlikely by themselves: an invalid bench and a network outage.

Although it does not give the full explanation. we should probably disable resurrecting finished tests using the "Modify" trick since this seems a recipe for bugs. But then we should only stop a test after several wrong benches, to avoid that a single worker with a HW defect ruins a test.

I don't think it would be wise to change the semantics of the existing stop_run so we may need a new api: wrong_bench. Unless we can use failed_task.

vdbergh commented 2 years ago

It seems failed_task currently does not write an action. This counts as a bug I think.

ppigazzini commented 2 years ago

I don't understand what the failed_task api does, we call it after a successful games() call too. From git blame that logic is 9 years old. (BTW we should drop all the except: pass) https://github.com/glinscott/fishtest/blob/278a946a191fed9e2d45f5258b0fd93c846dc382/worker/worker.py#L232-L254

vdbergh commented 2 years ago

You are right. It seems that all failed_task does is to set task["active"]=False (this means no worker is working on this task). It does not mean failure. So failed_task is misnomer.

vdbergh commented 2 years ago

Still I wonder if failed_task is actually called after successful completion of the task. It should execute the following statement

print(
            "Inactive: https://tests.stockfishchess.org/tests/view/{} {} {}".format(
                run_id, task if DEBUG else task_id, worker_name(task["worker_info"])
            ),
            flush=True,
        )
ppigazzini commented 2 years ago

EDIT_000: finally: clause is executed no matter what, and is generally used to release external resources.

https://github.com/glinscott/fishtest/blob/278a946a191fed9e2d45f5258b0fd93c846dc382/server/fishtest/rundb.py#L32 https://github.com/glinscott/fishtest/blob/278a946a191fed9e2d45f5258b0fd93c846dc382/server/fishtest/rundb.py#L955-L960 https://github.com/glinscott/fishtest/blob/278a946a191fed9e2d45f5258b0fd93c846dc382/server/fishtest/rundb.py#L978-L980

Now DEBUG = False so the log has:

Oct 11 11:06:34 tests.stockfishchess.org pserve[23040]: Failed_task: task 91 is not active
Oct 11 11:06:34 tests.stockfishchess.org pserve[23040]: dead task: run: https://tests.stockfishchess.org/tests/view/6164120283dd501a05b0b1cb task_id: 30 worker: ChessDBCN-16cores-586a55f7
Oct 11 11:06:49 tests.stockfishchess.org pserve[23040]: Failed_task: task 26 is not active
Oct 11 11:06:55 tests.stockfishchess.org pserve[23040]: Failed_task: task 99 is not active
Oct 11 11:07:06 tests.stockfishchess.org pserve[23040]: dead task: run: https://tests.stockfishchess.org/tests/view/6163fddc83dd501a05b0b1b1 task_id: 76 worker: ChessDBCN-16cores-aca87cf7
Oct 11 11:07:22 tests.stockfishchess.org pserve[23040]: Failed_task: task 108 is not active
Oct 11 11:07:23 tests.stockfishchess.org pserve[23040]: Failed_task: task 100 is not active
Oct 11 11:07:23 tests.stockfishchess.org pserve[23040]: Failed_task: task 94 is not active
Oct 11 11:07:28 tests.stockfishchess.org pserve[23040]: Failed_task: task 1270 is not active
Oct 11 11:07:35 tests.stockfishchess.org pserve[23040]: Failed_task: task 91 is not active
Oct 11 11:08:19 tests.stockfishchess.org pserve[23040]: Failed_task: task 76 is not active
fishtest@tests:~$ Oct 11 11:06:04 tests.stockfishchess.org pserve[23040]: Failed_task: task 205 is not active
Oct 11 11:07:06 tests.stockfishchess.org pserve[23040]: dead task: run: https://tests.stockfishchess.org/tests/view/6163fddc83dd501a05b0b1b1 task_id: 76 worker: ChessDBCN-16cores-aca87cf7
Oct 11 11:07:22 tests.stockfishchess.org pserve[23040]: Failed_task: task 108 is not active
Oct 11 11:07:23 tests.stockfishchess.org pserve[23040]: Failed_task: task 100 is not active
Oct 11 11:07:23 tests.stockfishchess.org pserve[23040]: Failed_task: task 94 is not active

In the past DEBUG = True so the log had:

Aug 06 01:46:08 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610b6ef72a8a49ac5be7948c 1741 Dantist-7cores-3-4e4ee4e0
Aug 06 01:46:15 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c4bcd2a8a49ac5be7953e 214 Dantist-15cores-5-e4b251d3
Aug 06 01:46:15 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c76d02a8a49ac5be79550 28 Dantist-7cores-4-4f22dbe0
Aug 06 01:46:21 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c76d02a8a49ac5be79550 27 technologov-2cores-42-fd429153
Aug 06 01:46:24 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c0a452a8a49ac5be794ef 618 Dantist-7cores-3-4e4ee4e0
Aug 06 01:46:30 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c76d02a8a49ac5be79550 27 Dantist-15cores-5-e4b251d3
Aug 06 01:46:31 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c0ef92a8a49ac5be794f5 375 Dantist-7cores-4-4f22dbe0
Aug 06 01:46:39 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c4bcd2a8a49ac5be7953e 215 Dantist-7cores-3-4e4ee4e0
Aug 06 01:46:43 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c31a52a8a49ac5be79529 294 technologov-2cores-42-fd429153
Aug 06 01:46:46 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c177a2a8a49ac5be7950c 314 Dantist-15cores-5-e4b251d3
Aug 06 01:46:46 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c4bcd2a8a49ac5be7953e 216 Dantist-7cores-4-4f22dbe0
Aug 06 01:46:55 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c31a52a8a49ac5be79529 297 Dantist-7cores-3-4e4ee4e0
Aug 06 01:47:01 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c76d02a8a49ac5be79550 28 Dantist-15cores-5-e4b251d3
Aug 06 01:47:01 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c76d72a8a49ac5be79554 34 Dantist-7cores-4-4f22dbe0
Aug 06 01:47:04 tests.stockfishchess.org pserve[7486]: Inactive: https://tests.stockfishchess.org/tests/view/610c4bcd2a8a49ac5be7953e 216 technologov-2cores-42-fd429153
vdbergh commented 2 years ago

But

       print(
            "Inactive: https://tests.stockfishchess.org/tests/view/{} {} {}".format(
                run_id, task if DEBUG else task_id, worker_name(task["worker_info"])
            ),
            flush=True,
        )

in rundb.py:failed_task() should always be executed. It does not depend on DEBUG=True.

ppigazzini commented 2 years ago

Ops, sorry, multitasking here, missed the different subject and missed that "Inactive:" is sparse, it shows up using grep.

sudo journalctl -u fishtest@6543 --since "1 days ago" | grep "Inactive:"

Oct 11 03:41:06 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/61636ecd83dd501a05b0b189 184 ChessDBCN-16cores-71bc52c3
Oct 11 03:59:12 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163b4b983dd501a05b0b19d 13 ChessDBCN-16cores-c85dfd66
Oct 11 03:59:12 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163b4b983dd501a05b0b19d 11 ChessDBCN-16cores-813ef38e
Oct 11 03:59:28 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163b4b983dd501a05b0b19d 12 ChessDBCN-16cores-d6e8e7fc
Oct 11 04:01:55 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163b4b983dd501a05b0b19d 26 ChessDBCN-16cores-5d22698b
Oct 11 04:16:04 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163b4b983dd501a05b0b19d 40 ChessDBCN-16cores-d7a959f2
Oct 11 04:32:02 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/616332c683dd501a05b0b15c 711 bcross-15cores-8ad8ea2c
Oct 11 04:51:09 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163684183dd501a05b0b183 326 okrout-30cores-9b7f73f1
Oct 11 05:00:04 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/616332c683dd501a05b0b15c 746 TueRens-3cores-eab43f1d
Oct 11 05:02:58 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163c19083dd501a05b0b1a9 14 TueRens-7cores-529b897b
Oct 11 06:14:07 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/616332c683dd501a05b0b15c 828 user213718-1cores-8f1b2f77
Oct 11 07:56:28 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163511083dd501a05b0b16e 729 bcross-15cores-8ad8ea2c
Oct 11 08:08:20 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163510183dd501a05b0b16a 960 ctoks-3cores-5ea45efb
Oct 11 10:35:55 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/616411f983dd501a05b0b1c9 15 ChessDBCN-16cores-b2a736f8
Oct 11 10:47:38 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6164120a83dd501a05b0b1cd 22 grandphish2-16cores-1a757e23
Oct 11 11:24:13 tests.stockfishchess.org pserve[23040]: Inactive: https://tests.stockfishchess.org/tests/view/6163fe1983dd501a05b0b1b3 124 bcross-15cores-8ad8ea2c
vdbergh commented 2 years ago

Ok. I guess it would be better to call it stop_task. With a message in actiondb if the task finished uncleanly.

ppigazzini commented 2 years ago

The "Inactive:" log entry changed between 2021-09-11 and 2021-09-11 (dynamic task?):

sudo journalctl -u fishtest@6543 --since  "2021-09-11 00:00:00" --until "2021-09-11 23:59:59" | grep "Inactive:" | wc -l
12755
sudo journalctl -u fishtest@6543 --since  "2021-09-12 00:00:00" --until "2021-09-12 23:59:59" | grep "Inactive:" | wc -l
67

One month entries before and after 2021-09-11:

sudo journalctl -u fishtest@6543 --since  "2021-08-11 00:00:00" --until "2021-09-11 23:59:59" | grep "Inactive:" | wc -l
341237
sudo journalctl -u fishtest@6543 --since  "2021-09-12 00:00:00" --until "2021-10-11 23:59:59" | grep "Inactive:" | wc -l
3370
vdbergh commented 2 years ago

It may be due to some changes of mine. I think most invocations now end in this

       if not task["active"]:
            print("Failed_task: task {} is not active".format(task_id), flush=True)
            return {"task_alive": False}

If a task completes normally this will be noticed by the last invocation of rundb.py:update_task() and it will set the task to non active. Hence the above code in rundb.py:failed_task() will be triggered.

Only if a task finishes non normally will the Inactive: log entry be created.

At least that's what I think.

vdbergh commented 2 years ago

In other words. Calling failed_task after a task that completed normally is harmless but also useless.

ppigazzini commented 2 years ago

In other words. Calling failed_task after a task that completed normally is harmless but also useless.

Good, we could retire that api. Thanks for the behavior analysis!

sudo journalctl -u fishtest@6543 --since  "2021-08-11 00:00:00" --until "2021-09-11 23:59:59" | grep "Failed_task:" | wc -l
0
sudo journalctl -u fishtest@6543 --since  "2021-09-12 00:00:00" --until "2021-10-11 23:59:59" | grep "Failed_task:" | wc -l
394033