Closed tahpot closed 3 years ago
I think this is resolved, but will wait and see how it goes in the test environment before closing this issue.
No resolved, still occurring.
Testnet has been updated with this branch to try and catch this error:
https://github.com/verida/vault-auth-server/tree/bug/9-ceramic-gateway-error
Still not working. Slightly different error crashed it this time (503 - Service Temporarily Unavailable
):
/data/apps/verida-js/node_modules/@ceramicnetwork/http-client/node_modules/rxjs/dist/cjs/internal/util/reportUnhandledError.js:13
throw err;
^
Error: HTTP request to 'https://ceramic-clay.3boxlabs.com/api/v0/streams/kjzl6cwe1jw1482051achqlcwqlsuhj77y80npr9u3ay3ri5d8k4r5ngmz3pc9x?sync=0' failed with status 'Service Temporarily Unavailable': <html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
</body>
</html>
at Object.fetchJson (/data/apps/verida-js/node_modules/@ceramicnetwork/common/src/utils/http-utils.ts:19:11)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:94:5)
at Function._load (/data/apps/verida-js/node_modules/@ceramicnetwork/http-client/src/document.ts:109:23)
at Document._syncState (/data/apps/verida-js/node_modules/@ceramicnetwork/http-client/src/document.ts:59:19)
[nodemon] app crashed - waiting for file changes before starting...
For now, will start the service using PM2 so it will auto-restart.
PM2 is installed on testnet server.
Example script added to the feature branch: https://github.com/verida/vault-auth-server/blob/bug/9-ceramic-gateway-error/prod.sh
Logs can be checked by:
cat ~/.pm2/logs/vault-auth-server-error.log
cat ~/.pm2/logs/vault-auth-server-out.log
That still didn't work.
The server crashed and PM2 didn't detect it and restart.
I suspect the issue is the server is being run via nodemon
which doesn't actually die.
Have created a new babel build process and a package.json
script to run the server directly via node
instead of nodemon
.
This is now running on testnet.
This still didn't work either.
The server crashed and PM2 still didn't detect it and restart.
Logs indicated contextConfig
was undefined after catching a Ceramic error which caused the crash.
Have applied a fix and deployed to testnet.
There were multiple issues, however I believe all the errors are now being caught and handled with better error messages which prevents the application from crashing.
It appears pm2
doesn't detect when the app actually crashes, however it's a useful tool so I'll leave it as the recommended way to run the server.
Ceramic seems to throw random
502 Bad Gateway
error messages.See here:
Need to explore how to handle these and eventually log / track them for monitoring purposes.
Need to consider spinning up our own Ceramic testnet infrastructure.