official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
281 stars 130 forks source link

Validation of Internal Data Struture Failed [24‑06‑29 04:58:02] #2086

Closed xu-shawn closed 2 months ago

xu-shawn commented 3 months ago
image
xu-shawn commented 3 months ago

Text: Validation of internal data structures failed: run_cache['6672f777602682471b06515d'] is not in the schema

vdbergh commented 3 months ago

Hmmm. This is mystifying. I wonder if perhaps an older version of vtjson is currently running on PROD?

ppigazzini commented 3 months ago

Hmmm. This is mystifying. I wonder if perhaps an older version of vtjson is currently running on PROD?

No, PROD is running vtjson v 1.9.0, the latest on pypy https://pypi.org/project/vtjson/ In the last weeks I always tested your PRs also on PROD before merging, and that requires to rebuild the server from scratch, downloading the new packages.

:~$ ${VENV}/bin/python3 -m pip list
Package              Version     Editable project location
-------------------- ----------- ------------------------------
awscli               1.33.17
botocore             1.34.135
certifi              2024.6.2
charset-normalizer   3.3.2
colorama             0.4.6
dnspython            2.6.1
docutils             0.16
email_validator      2.2.0
fishtest-server      0.1         /home/fishtest/fishtest/server
hupper               1.12.1
idna                 3.7
jmespath             1.0.1
Mako                 1.3.5
MarkupSafe           2.1.5
numpy                2.0.0
PasteDeploy          3.1.0
pip                  24.1.1
plaster              1.1.2
plaster-pastedeploy  1.0.1
pyasn1               0.6.0
Pygments             2.18.0
pymongo              4.8.0
pyramid              2.0.2
pyramid_debugtoolbar 4.12.1
pyramid-mako         1.1.0
python-dateutil      2.9.0.post0
python-magic         0.4.27
PyYAML               6.0.1
requests             2.32.3
rsa                  4.7.2
s3transfer           0.10.2
scipy                1.14.0
setuptools           70.1.1
six                  1.16.0
translationstring    1.4
urllib3              2.2.2
venusian             3.1.0
vtjson               1.9.0
waitress             3.0.0
WebOb                1.8.7
wheel                0.43.0
zope.deprecation     5.0
zope.interface       6.4.post2
zxcvbn               4.4.28
vdbergh commented 3 months ago

Thanks. The thing is that the error is exactly explained by having an older version of vtjson that did not know about keys being themselves schemas (the run_cache schema is a dict with a single key which is the run_id schema).

Validating or run_cache used to work fine. It currently also works fine on DEV apparently. vtjson does not keep any state (e.g. no internal cache).

Can you maybe print vtjson.__version__ after starting the server just to be absolutely sure that the version is ok?

ppigazzini commented 3 months ago

+import vtjson + +print("vtjson:", vtjson.version, flush=True) +

def thread_stack_dump(sig, frame): for th in threading.enumerate():

- venv built yesterday when testing #1993
```bash
$ ls -la ~/fishtest/server/env/lib/python3.12/site-packages/vtjson*
-rw-rw-r-- 1 fishtest fishtest 40241 Jun 28 18:53 /home/fishtest/fishtest/server/env/lib/python3.12/site-packages/vtjson.py

/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/vtjson-1.9.0.dist-info:
total 52
drwxrwxr-x  2 fishtest fishtest  4096 Jun 28 18:53 .
drwxrwxr-x 93 fishtest fishtest  4096 Jun 28 18:53 ..
-rw-rw-r--  1 fishtest fishtest     4 Jun 28 18:53 INSTALLER
-rw-rw-r--  1 fishtest fishtest  1077 Jun 28 18:53 LICENSE
-rw-rw-r--  1 fishtest fishtest 22424 Jun 28 18:53 METADATA
-rw-rw-r--  1 fishtest fishtest   579 Jun 28 18:53 RECORD
-rw-rw-r--  1 fishtest fishtest    92 Jun 28 18:53 WHEEL
-rw-rw-r--  1 fishtest fishtest     7 Jun 28 18:53 top_level.txt
vdbergh commented 3 months ago

Thanks. Yes it seems 1.9.0 is really running...

So currently I cannot explain why this error suddenly appeared and only on PROD :frowning_face:

I will think more.

vdbergh commented 3 months ago

BTW the error is annoying but benign.

vdbergh commented 3 months ago

Apparently the error has stopped now. Very mysterious. If it appears again we should probably enhance the code to print more detailed error messages (i.e. what is the object that does not validate, what is the actual schema...).

ppigazzini commented 3 months ago
~$ sudo journalctl -u fishtest@6543 --since "24 hours ago" | grep "Started Fishtest"
Jun 28 18:53:18 ns304014 systemd[1]: Started Fishtest Server port 6543.
Jun 29 09:02:09 ns304014 systemd[1]: Started Fishtest Server port 6543.
Jun 29 09:41:38 ns304014 systemd[1]: Started Fishtest Server port 6543.

List of start:

  1. rebuild the server to test #1993
  2. revert to master (without rebuilding)
  3. print the vtjson version (without rebuilding)

The second start could have fixed something about that failed validation.

vdbergh commented 3 months ago

So maybe it has something to do with #1993 although cannot fathom what.

vdbergh commented 3 months ago

I also observed it on my local Fishtest. This means I can debug it!

ppigazzini commented 3 months ago

Kernighan’s law

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it

vdbergh commented 3 months ago

Perhaps. Unfortunately I do not know how to trigger the bug. I only observed it once on my local Fishtest.

vdbergh commented 2 months ago

This can be closed as well. The bug was due to #1993.