Open enaydanov opened 3 months ago
What is your expectation wrt 404? (as opposed to other errors, do you feel there's any point in retrying or something?)
For example, we can show incomplete info and warn user about this. Also we can instruct user what to do next: run nodetool again, or check logs, etc.
Don't think retries is a really good idea from the UX pov.
@tchaikov - it seems to fail on Thrift - which I believe is removed already?
@tchaikov - ping?
sorry, i missed this one. i am on it now.
@tchaikov - it seems to fail on Thrift - which I believe is removed already?
the symptom is indeed related to thrift. but the problem is not limited to it. the root cause is that the handler exposing this RESTful API is not up yet, when the nodetool accesses the web server. i am not sure what is the best way to address this UX issue. what i can do, though, is to minimize the front that faces this problem.
but please note, client is not guaranteed to be functional before the server is fully up and running.
We cannot just swallow errors just to be more "friendly" to the user. Nodetool cannot be used while ScyllaDB is initializing.
We can add a new REST API endpoint, which is registered very early, and which can be used to poll whether ScyllaDB is ready. When this endpoint returns false, nodetool prints a warning that ScyllaDB is still initializing and the user should try again later.
Note that we have to be careful with this, because some commands might be useful while ScyllaDB is initializing, so we might want to add exceptions to this.
We can add a new REST API endpoint, which is registered very early, and which can be used to poll whether ScyllaDB is ready. When this endpoint returns false, nodetool prints a warning that ScyllaDB is still initializing and the user should try again later.
Note that we have to be careful with this, because some commands might be useful while ScyllaDB is initializing, so we might want to add exceptions to this.
Calling
nodetool info
during a node bootstrap can fail with something like:(from https://argus.scylladb.com/test/4c332b0b-a707-40b2-881f-cf7f37a33b6e/runs?additionalRuns[]=bdfd3c7a-e805-4a56-8eaa-7ddfe8cbc44c)
This is because handler for
/storage_service/rpc_server
endpoint set up very close to the end of initialization:https://github.com/scylladb/scylladb/blob/1094c71282d8841ddb8af98ba5ae761d78572b6d/main.cc#L2086-L2100
nodetool info
code doesn't have any error handling code:https://github.com/scylladb/scylladb/blob/1094c71282d8841ddb8af98ba5ae761d78572b6d/tools/scylla-nodetool.cc#L916-L922
This is really user-unfriendly to expose 404 here, especially for the endpoint which exists for the compatibility reason.