vertexproject / synapse

Synapse Central Intelligence System
Apache License 2.0
353 stars 74 forks source link

[BUG] Updating hugenum index values results in CantRevLayer: layer is from the future error and container exit #2597

Closed bertdg closed 2 years ago

bertdg commented 2 years ago

Describe the bug When updating synapse versions the hugenum change seems to have impacted revCoreLayers in a way that causes cortex to exit. Just checking if this is a known bug (or I may have missed a needed step during an upgrade. Also I am running docker containers against master which is clearly not a best practice). There is no impact for me. I can delete the data volumes and start from scratch, but if I had vital data it would be nice to know how to recover.

To Reproduce docker-compose pull docker-compose up

version: '3'

services: cortex_0: environment:

Expected behavior Expect the hugemem change to not put a cortex into an unstartable state. Bonus that there is documentation on how to recover from a state like this.

Environment (please complete the following information):

Additional context cortex_0_1 | {"message": "Updating hugenum index values: /vertex/storage/layers/9261972f63e1c41338fb60648d2b67fd", "logger": {"name": "synapse.lib.layer", "process": "MainProcess", "filename": "layer.py", "func": "_layrV7toV8"}, "level": "WARNING", "time": "2022-03-15 22:12:14,352"} cortex_0_1 | {"message": "...complete!", "logger": {"name": "synapse.lib.layer", "process": "MainProcess", "filename": "layer.py", "func": "_layrV7toV8"}, "level": "WARNING", "time": "2022-03-15 22:12:14,355"} cortex_0_1 | Traceback (most recent call last): cortex_0_1 | File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main cortex_0_1 | return _run_code(code, main_globals, None, cortex_0_1 | File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code cortex_0_1 | exec(code, run_globals) cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/servers/cortex.py", line 8, in cortex_0_1 | asyncio.run(s_cortex.Cortex.execmain(sys.argv[1:])) cortex_0_1 | File "/usr/local/lib/python3.8/asyncio/runners.py", line 44, in run cortex_0_1 | return loop.run_until_complete(main) cortex_0_1 | File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete cortex_0_1 | return future.result() cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/lib/cell.py", line 2406, in execmain cortex_0_1 | cell = await cls.initFromArgv(argv, outp=outp) cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/lib/cell.py", line 2353, in initFromArgv cortex_0_1 | cell = await cls.anit(opts.dirn, conf=conf) cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/lib/base.py", line 97, in anit cortex_0_1 | await self.anit(*args, **kwargs) cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/lib/cell.py", line 1026, in anit cortex_0_1 | await self.initServiceRuntime() cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/cortex.py", line 1324, in initServiceRuntime cortex_0_1 | await self._checkLayerModels() cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/cortex.py", line 3490, in _checkLayerModels cortex_0_1 | await mrev.revCoreLayers() cortex_0_1 | File "/usr/local/lib/python3.8/site-packages/synapse/lib/modelrev.py", line 374, in revCoreLayers cortex_0_1 | raise s_exc.CantRevLayer(layer=layr.iden, mesg=mesg, curv=version, layv=vers) cortex_0_1 | synapse.exc.CantRevLayer: CantRevLayer: curv=(0, 2, 7) layer='9261972f63e1c41338fb60648d2b67fd' layv=(0, 2, 8) mesg='layer Layer 9261972f63e1c41338fb60648d2b67fd (/vertex/storage/layers/9261972f63e1c41338fb60648d2b67fd) is from the future!' ti-analysis-synapse_cortex_0_1 exited with code 1

Thank you

bertdg commented 2 years ago

Running the Vertex provided (thank you!) python script from the cortex_0 (docker volume) directory fixed my problem and I was able to startup the cortex with no problem

cortex_0$ ./fix_hugemem.py Layer 9261972f63e1c41338fb60648d2b67fd version (0, 2, 8) Updating layer 9261972f63e1c41338fb60648d2b67fd to (0, 2, 7)

vEpiphyte commented 2 years ago

This condition was the result of running off a :master docker tag and having performed an upgrade that was available from 2859cec7a08345988c9723732c15cf94f53ca09a which was then reverted the following day 39e5885d1005ce9671acb25216c461675dd59312. The internal version checking which prevents a Cortex downgrading w/ respect to its storage got the cortex into a state that prevented normal operation.

For production / deployment testing purposes, I would recommend using a stable release tag, like a floating release tag v2.x.x or a fixed release tag.