official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
281 stars 129 forks source link

Incorrect username/password #323

Closed noobpwnftw closed 3 years ago

noobpwnftw commented 6 years ago

For some reason the ChessDB_CN_WorkerPool login no longer works. Maybe someone can check if there is a bug or the password has indeed been changed(not by me).

EDIT: I can see a new restriction may have been applied, thus I have made a new account ChessDBCN which follows the new rules, please approve and raise machine limit, thanks!

Stefano80 commented 6 years ago

Hi @noobpwnftw , interestingly, I cannot unblock users at the moment, so something worse must be happening. Specifically, I get a waitress error when I click on Pending.

EDIT: I can login normally.

noobpwnftw commented 6 years ago

Right, I only can't login with that particular account, the others seemed fine.

Stefano80 commented 6 years ago

@noobpwnftw : can you post the error message you get?

noobpwnftw commented 6 years ago

On worker it says: Fetch task... Incorrect username/password Exception accessing host: Traceback (most recent call last): File "fishtest-master/worker/worker.py", line 92, in worker SystemExit: 1

On website it says Invalid password

Stefano80 commented 6 years ago

Ok, thx. I will get into this in the evening, hope is ok for you.

ppigazzini commented 6 years ago

I just checked, there is not an user ChessDB_CN_WorkerPool in mongoDB.

python << EOF
import os, sys
sys.path.append(os.path.expanduser('~/fishtest/fishtest/fishtest'))
from rundb import RunDb
rdb = RunDb()
print rdb.userdb.get_user('ChessDB_CN_WorkerPool')
EOF

None
ppigazzini commented 6 years ago

I view that there is a pending user, but I haven't the approver right on the production server to check if the user was blocked.

Stefano80 commented 6 years ago

Hi @ppigazzini I have the rights but I get a waitress error when I try to view this.

noobpwnftw commented 6 years ago

The account shouldn't disappear though... So probably another disk full occurred?

Stefano80 commented 6 years ago

Hi, I have manually unblocked the new user, and checked that the disk is not full. Can you check whether it works?

Stefano80 commented 6 years ago

Hi @ppigazzini, should I just add you to the approvers? If you don't want for some reasons, ok, but I think it would make sense.

Stefano80 commented 6 years ago

So there are two problems still: 1) Why can't I login to the Pending view 2) Why the original user from @noobpwnftw was missing?

noobpwnftw commented 6 years ago

@Stefano80 the new account works now.

Stefano80 commented 6 years ago

Good. One down, two to go.

tomtor commented 6 years ago

I also get the waitress error when i try to view /pending

What does the error log show?

Stefano80 commented 6 years ago

Mmhh... cannot find the error logs, we need @ppigazzini (can you then add the information to the very cool overview?)

ppigazzini commented 6 years ago

@Stefano80 I'm not a C++ SF developer, so I'm not a good approver. The logs are in (overview updated): /home/fishtest/mongodb/bin/mongod.out /home/fishtest/fishtest/fishtest/nohup.out

Unfortunately my first automatic action was to reboot the VPS and at the moment there is not log rotation, after a service restart or server reboot the logs are overwritten. I try to change the startup scripts in order to save save the old log.

Anyway here are the new logs: mongod.zip nohup.zip

Stefano80 commented 6 years ago

Thx @ppigazzini ! I triggered the waitress error some times and I repeatedly checked the mongo log. The only thing which looks like some culprit could be this

Sat Jul 14 22:04:46.415 [TTLMonitor] ERROR: key for ttl index can only have 1 field
Sat Jul 14 22:04:46.415 [TTLMonitor] ERROR: key for ttl index can only have 1 field
Sat Jul 14 22:04:46.415 [TTLMonitor] ERROR: key for ttl index can only have 1 field

Does it make sense to anybody?

tomtor commented 6 years ago

What does nohup.out show?

Stefano80 commented 6 years ago

Nothing.

noobpwnftw commented 6 years ago

Workers got "no tasks available" very often, I'm running only 30 of them.

ppigazzini commented 6 years ago

Also my username is deleted ...

ppigazzini commented 6 years ago

This is the log of one of my workers. It was running fine, then an error during the test and, when the batch of 250 games was finished, the error with the login.

Started game 65 of 250 (New-26e3bd6 vs Base-a6fa6a9)
Finished game 63 (New-26e3bd6 vs Base-a6fa6a9): 1/2-1/2 {Draw by adjudication}
Score of New-26e3bd6 vs Base-a6fa6a9: 10 - 15 - 38  [0.460] 63
Task updated successfully in 0.132186s
Started game 66 of 250 (Base-a6fa6a9 vs New-26e3bd6)
Finished game 64 (Base-a6fa6a9 vs New-26e3bd6): 0-1 {Black wins by adjudication}
Score of New-26e3bd6 vs Base-a6fa6a9: 11 - 15 - 38  [0.469] 64
Task updated successfully in 0.141231s
Exception from calling update_task:
Traceback (most recent call last):
  File "/home/fishtest/fishtest-master/worker/games.py", line 265, in run_game
    if not req['task_alive']:
KeyError: 'task_alive'
Started game 67 of 250 (New-26e3bd6 vs Base-a6fa6a9)
Finished game 65 (New-26e3bd6 vs Base-a6fa6a9): 1/2-1/2 {Draw by insufficient mating material}
Score of New-26e3bd6 vs Base-a6fa6a9: 11 - 15 - 39  [0.469] 65
Task updated successfully in 0.092387s
Exception from calling update_task:
Traceback (most recent call last):
  File "/home/fishtest/fishtest-master/worker/games.py", line 265, in run_game
    if not req['task_alive']:
KeyError: 'task_alive'

....

Finished game 248 (Base-a6fa6a9 vs New-26e3bd6): 1/2-1/2 {Draw by insufficient mating material}
Score of New-26e3bd6 vs Base-a6fa6a9: 43 - 46 - 160  [0.494] 249
Task updated successfully in 0.105444s
Exception from calling update_task:
Traceback (most recent call last):
  File "/home/fishtest/fishtest-master/worker/games.py", line 265, in run_game
    if not req['task_alive']:
KeyError: 'task_alive'
Finished game 250 (Base-a6fa6a9 vs New-26e3bd6): 1-0 {White wins by adjudication}
Score of New-26e3bd6 vs Base-a6fa6a9: 43 - 47 - 160  [0.492] 250
Task updated successfully in 0.109892s
Exception from calling update_task:
Traceback (most recent call last):
  File "/home/fishtest/fishtest-master/worker/games.py", line 265, in run_game
    if not req['task_alive']:
KeyError: 'task_alive'
Elo difference: -5.56 +/- 25.85
Finished match
Finished match cleanly
Note: <type 'exceptions.OSError'> killing the process pid: 14895, possibly already terminated
Wait 10 seconds before upload of PGN...
Uploading compressed PGN of 363344 bytes
Task exited
Fetch task...
Incorrect username/password
Exception accessing host:
Traceback (most recent call last):
  File "fishtest-master/worker/worker.py", line 92, in worker
SystemExit: 1
Fetch task...
Incorrect username/password
Exception accessing host:
Traceback (most recent call last):
  File "fishtest-master/worker/worker.py", line 92, in worker
SystemExit: 1
ppigazzini commented 6 years ago

update_users.py exits with this error:

[fishtest@tests ~]$ ~/fishtest/fishtest/utils/update_users.py
Traceback (most recent call last):
  File "/home/fishtest/fishtest/fishtest/utils/update_users.py", line 104, in <module>
    main()
  File "/home/fishtest/fishtest/fishtest/utils/update_users.py", line 101, in main
    update_users()
  File "/home/fishtest/fishtest/fishtest/utils/update_users.py", line 72, in update_users
    process_run(run, info)
  File "/home/fishtest/fishtest/fishtest/utils/update_users.py", line 33, in process_run
    info[username]['last_updated'] = task['last_updated']
KeyError: u'ChessDB_CN_WorkerPool'
noobpwnftw commented 6 years ago

Ok, so if it isn't some security breach, shouldn't we just upgrade MongoDB and restore the database from a backup?

noobpwnftw commented 6 years ago

I don't think there can be some kind of bug in fishtest that deletes the account, since it's been running fine for so long. The only variable would be the increasing size of database that may eventually trigger some weird bug of ancient MongoDB installation.

Also my new account still has its machine limit, probably someone can help with raising it. The VMs I set up last time is still there, might be possible act as test servers.

tomtor commented 6 years ago

/pending and update_users.py crash because ChessDB_CN_WorkerPool is in user_cache but not in users. This should never happen and there is no code in fishtest server that deletes user entries.

Creating a new ChessDB_CN_WorkerPool account (we want to keep the statistics in user_cache) could fix this unless other active accounts are also gone.

The only code which DOES delete users (idle ones) is scavenge.py so if we rule out a mongodb issue then we should inspect that code closely.

tomtor commented 6 years ago

I think I found a possible cause. Under heavy load mongodb queries return incomplete results. So if an active user is dropped from the result list then scavenge.py will remove this user.

In this case ChessDB_CN_WorkerPool probably because it had very frequent updates.

Commenting out scavenge_users() from scavenge.py is a short term fix.

noobpwnftw commented 6 years ago

It could be the cause, but that doesn't explain why @ppigazzini 's account disappeared too.

ppigazzini commented 6 years ago

@noobpwnftw raised the limit to 200 machine for your new account. @tomtor scavenge_users() commented out. Also my account "ctoks" is gone.

tomtor commented 6 years ago

As long as we don't sync user and user_cache, /pending crashes and user CPU contributions are not recorded.

Usernames with '_' are no longer allowed. The most simple action would be to add @ppigazzini account in the normal way (just use the old names) and rename ChessDB_CN_WorkerPool to ChessDBCN by hand in the user_cache database?

ppigazzini commented 6 years ago

I'm wondering if the simple solution to restore the deleted users could be mongorestore dump/fishtest_new/users.bson from the backup of 13.07.2018. From the documentation "mongorestore performs inserts only and does not perform updates. That is, if restoring documents to an existing database and collection and existing documents have the same value _id field as the to-be-restored documents, mongorestore will not overwrite those documents."

tomtor commented 6 years ago

@ppigazzini Restoring just users sounds good

Stefano80 commented 6 years ago

Hi @ppigazzini , I agree with @tomtor , from the documentation is what we need.

ppigazzini commented 6 years ago

Users successfully restored:

$ mongorestore  dump/fishtest_new/users.bson
connected to: 127.0.0.1
Sun Jul 15 21:07:26.024 dump/fishtest_new/users.bson
Sun Jul 15 21:07:26.024 going into namespace [fishtest_new.users]
Sun Jul 15 21:07:26.488 warning: Restoring to fishtest_new.users without dropping. Restored data will be inserted without raising errors; check your server log
1234 objects found
Sun Jul 15 21:07:26.493 Creating index: { key: { _id: 1 }, ns: "fishtest_new.users", name: "_id_" }
Sun Jul 15 21:07:26.593 Creating index: { key: { username: 1 }, unique: true, ns: "fishtest_new.users", name: "username_1" }

There are 5 users inactive from 1 day, candidate to be the users deleted.

Stefano80 commented 6 years ago

Thx. Interestingly, it solved also the waitress error on the pending user page, fixing the last of the problems :-)

Should we close this issue or keep it open until we confirm somehow that the explanation proposed by @tomtor is the correct one?

ppigazzini commented 6 years ago

@Stefano80 I think that all the code that check the users list was exiting with an error as update_users.py

ppigazzini commented 6 years ago

@Stefano80 Please keep open the issue. If the @tomtor hypothesis is correct, we need to fix the scavenge_users().

tomtor commented 6 years ago

Restore looks good, thanks @ppigazzini

Indeed we need to fix scavenge.py at a later date so let's keep this open.

ppigazzini commented 6 years ago

Fast fix for the backup of the logs after a reboot

#!/bin/sh
DIR=/home/fishtest/logs
STAMP=$(date +%Y%m%d%H%M%S --utc -d '1 hour')
cd /home/fishtest

(
cd mongodb/bin
cp mongod.out $DIR/mongod.out.$STAMP
nohup ./mongod > mongod.out &
)

sleep 5

(
cd fishtest/fishtest
cp nohup.out $DIR/nohup.out.$STAMP
./start.sh
)
ppigazzini commented 5 years ago

@tomtor the official server is running with scavenge_users() comment out by hand in scavenge.py. A new setup of the fishtest server will run scavenge_users(). Perhaps is safer to open a PR to comment out scavenge_users().

tomtor commented 5 years ago

@ppigazzini I now know the root cause:

update_users.py will empty the user_cache for a small period when it runs.

If scavenge.py runs at the same time then users will deleted because they are considered not active.

We should add the scavenge_users functionality to update_users to fix this at a later time...

ppigazzini commented 5 years ago

@tomtor we should also run the maintenance scripts in a more robust way. On the dev server (w/ a lower IO) a new scheduled instance of update_users.py was started while the previous one was still running: perhaps not dangerous for the DB integrity but could bring down the server performance.

kelvinwop commented 5 years ago

I also can't login, I tried making another account but that doesn't work either

Worker version 59 connecting to http://tests.stockfishchess.org:80 Incorrect username/password Exception accessing host: Traceback (most recent call last): File "E:\fishtest\Fishtest-Worker-Portable\fishtest\worker-CPU0\worker.py", line 62, in worker sys.exit(1) SystemExit: 1

tomtor commented 5 years ago

@kelvinwop You have to be unlocked. Login should work now...