Closed xu-shawn closed 2 months ago
Text:
Validation of internal data structures failed: run_cache['6672f777602682471b06515d'] is not in the schema
Hmmm. This is mystifying. I wonder if perhaps an older version of vtjson is currently running on PROD?
Hmmm. This is mystifying. I wonder if perhaps an older version of vtjson is currently running on PROD?
No, PROD is running vtjson v 1.9.0
, the latest on pypy
https://pypi.org/project/vtjson/
In the last weeks I always tested your PRs also on PROD before merging, and that requires to rebuild the server from scratch, downloading the new packages.
:~$ ${VENV}/bin/python3 -m pip list
Package Version Editable project location
-------------------- ----------- ------------------------------
awscli 1.33.17
botocore 1.34.135
certifi 2024.6.2
charset-normalizer 3.3.2
colorama 0.4.6
dnspython 2.6.1
docutils 0.16
email_validator 2.2.0
fishtest-server 0.1 /home/fishtest/fishtest/server
hupper 1.12.1
idna 3.7
jmespath 1.0.1
Mako 1.3.5
MarkupSafe 2.1.5
numpy 2.0.0
PasteDeploy 3.1.0
pip 24.1.1
plaster 1.1.2
plaster-pastedeploy 1.0.1
pyasn1 0.6.0
Pygments 2.18.0
pymongo 4.8.0
pyramid 2.0.2
pyramid_debugtoolbar 4.12.1
pyramid-mako 1.1.0
python-dateutil 2.9.0.post0
python-magic 0.4.27
PyYAML 6.0.1
requests 2.32.3
rsa 4.7.2
s3transfer 0.10.2
scipy 1.14.0
setuptools 70.1.1
six 1.16.0
translationstring 1.4
urllib3 2.2.2
venusian 3.1.0
vtjson 1.9.0
waitress 3.0.0
WebOb 1.8.7
wheel 0.43.0
zope.deprecation 5.0
zope.interface 6.4.post2
zxcvbn 4.4.28
Thanks. The thing is that the error is exactly explained by having an older version of vtjson
that did not know about keys being themselves schemas (the run_cache
schema is a dict
with a single key which is the run_id
schema).
Validating or run_cache
used to work fine. It currently also works fine on DEV apparently. vtjson
does not keep any state (e.g. no internal cache).
Can you maybe print vtjson.__version__
after starting the server just to be absolutely sure that the version is ok?
sudo journalctl -u fishtest@6543 --since "2 minutes ago"
Jun 29 09:41:37 ns304014 pserve[99462]: Clean_wtt_map: 131 active workers...
Jun 29 09:41:37 ns304014 pserve[99462]: flush
Jun 29 09:41:37 ns304014 systemd[1]: Stopping Fishtest Server port 6543...
Jun 29 09:41:37 ns304014 pserve[99462]: .............................done
Jun 29 09:41:38 ns304014 systemd[1]: fishtest@6543.service: Deactivated successfully.
Jun 29 09:41:38 ns304014 systemd[1]: Stopped Fishtest Server port 6543.
Jun 29 09:41:38 ns304014 systemd[1]: fishtest@6543.service: Consumed 29.814s CPU time.
Jun 29 09:41:38 ns304014 systemd[1]: Started Fishtest Server port 6543.
Jun 29 09:41:39 ns304014 pserve[99713]: vtjson: 1.9.0
Jun 29 09:41:39 ns304014 pserve[99713]: Starting server in PID 99713.
diff
diff --git a/server/fishtest/__init__.py b/server/fishtest/__init__.py
index a915eb47..c071a1aa 100644
--- a/server/fishtest/__init__.py
+++ b/server/fishtest/__init__.py
@@ -17,6 +17,10 @@ from pyramid.session import SignedCookieSessionFactory
from fishtest import helpers
+import vtjson + +print("vtjson:", vtjson.version, flush=True) +
def thread_stack_dump(sig, frame): for th in threading.enumerate():
- venv built yesterday when testing #1993
```bash
$ ls -la ~/fishtest/server/env/lib/python3.12/site-packages/vtjson*
-rw-rw-r-- 1 fishtest fishtest 40241 Jun 28 18:53 /home/fishtest/fishtest/server/env/lib/python3.12/site-packages/vtjson.py
/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/vtjson-1.9.0.dist-info:
total 52
drwxrwxr-x 2 fishtest fishtest 4096 Jun 28 18:53 .
drwxrwxr-x 93 fishtest fishtest 4096 Jun 28 18:53 ..
-rw-rw-r-- 1 fishtest fishtest 4 Jun 28 18:53 INSTALLER
-rw-rw-r-- 1 fishtest fishtest 1077 Jun 28 18:53 LICENSE
-rw-rw-r-- 1 fishtest fishtest 22424 Jun 28 18:53 METADATA
-rw-rw-r-- 1 fishtest fishtest 579 Jun 28 18:53 RECORD
-rw-rw-r-- 1 fishtest fishtest 92 Jun 28 18:53 WHEEL
-rw-rw-r-- 1 fishtest fishtest 7 Jun 28 18:53 top_level.txt
Thanks. Yes it seems 1.9.0 is really running...
So currently I cannot explain why this error suddenly appeared and only on PROD :frowning_face:
I will think more.
BTW the error is annoying but benign.
Apparently the error has stopped now. Very mysterious. If it appears again we should probably enhance the code to print more detailed error messages (i.e. what is the object that does not validate, what is the actual schema...).
~$ sudo journalctl -u fishtest@6543 --since "24 hours ago" | grep "Started Fishtest"
Jun 28 18:53:18 ns304014 systemd[1]: Started Fishtest Server port 6543.
Jun 29 09:02:09 ns304014 systemd[1]: Started Fishtest Server port 6543.
Jun 29 09:41:38 ns304014 systemd[1]: Started Fishtest Server port 6543.
List of start:
The second start could have fixed something about that failed validation.
So maybe it has something to do with #1993 although cannot fathom what.
I also observed it on my local Fishtest. This means I can debug it!
Kernighan’s law
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it
Perhaps. Unfortunately I do not know how to trigger the bug. I only observed it once on my local Fishtest.
This can be closed as well. The bug was due to #1993.