Open jkirschner-hashicorp opened 3 years ago
@dnephin : I'm not actually sure this is accurate
PUT
/catalog/deregister
: requires node ID, not name
The docs for deregister do say that the node argument must be "the ID of the node", but I think the code suggests that it's the node name that is used, not node ID. Node name is supposed to be unique within the cluster, so that seems reasonable?
Q: What's your read on whether node name or node ID is used for /catalog/deregister
? I can update the docs if needed.
Below is my attempt to trace through the code, though I feel like I went wrong somewhere... (because I ended up in a services table, not a node table)
Node
arg from the request gets passed to catalog_endpoint Deregister
: https://github.com/hashicorp/consul/blob/997547bd7fc7719bf4bc5ee608743408ad0ec9e5/agent/consul/catalog_endpoint.go#L334-L337State.NodeService
: https://github.com/hashicorp/consul/blob/997547bd7fc7719bf4bc5ee608743408ad0ec9e5/agent/consul/catalog_endpoint.go#L360getNodeServiceTxn
and performs a query using node name: https://github.com/hashicorp/consul/blob/997547bd7fc7719bf4bc5ee608743408ad0ec9e5/agent/consul/state/catalog.go#L1252-L1256Sorry, just catching up on this now. I think the second line you link (state.NodeService
) is only when ServiceID != ""
, and for a node deregister that should be the empty string, so we'd skip that line.
The first place I see the Node
field being referenced is in checking the ACL permissions here:
https://github.com/hashicorp/consul/blob/01e974046725f235ee954ca121dd08b2e975bb6e/agent/consul/catalog_endpoint.go#L403-L405
Our acl docs for node rule say it is expecting a node name, and I see other callers passing a node name, so definitely the ACL authorization appears to expect node name (not ID).
After that it goes into raftApply
, which ends up in the FSM here:
https://github.com/hashicorp/consul/blob/01e974046725f235ee954ca121dd08b2e975bb6e/agent/consul/fsm/commands_oss.go#L172
And into the state store here: https://github.com/hashicorp/consul/blob/01e974046725f235ee954ca121dd08b2e975bb6e/agent/consul/state/catalog.go#L546-L548
This query for tableNodes
, indexID
is matching the structs.Node.Node
field, which is indeed the name.
So it does seem our docs are wrong, this API uses node name, not node ID.
Regarding:
GET /agent/service/:service_id: requires service ID, not name
The current error message is "Unknown service ID:
Feature Description
Some entities within Consul have an ID and a Name (e.g., services). When the CLI or an API endpoint for such an entity specifically requires an ID, and will not work with a Name, Consul should provide an error message that makes this mistake clear to the user.
Example taken from #3122: let's say a user has a service with ID
web-service-id
and Nameweb-service-name
. If the user executes/v1/agent/service/deregister/web-service-name
, the following is output to the log:That error message is very misleading, as a service with that name does exist... just not a service with that id. The user might reasonably conclude something is wrong with the Consul catalog / state, when the actual problem is that the wrong field was used in the API call. Perhaps if Consul fails to lookup a service with that id, it could lookup whether any services exist with that name and, if some do, output a clearer warning message... such as:
This description will be updated to include relevant cases in the list below.
API:
/agent/service/:service_id
: requires service ID, not name/agent/service/deregister/:service_id
: requires service ID, not name (potentially related issues: #9861, #3122, #9415). See PR https://github.com/hashicorp/consul/pull/10894./catalog/deregister
: requires node ID, not name (user has tried to deregister using the node name in this issue and in the past #10848)... EDIT: the docs are wrong, node name is required, not ID. Need to update the docs.Use Case(s)
Ease troubleshooting of any CLI or API endpoints that accept an ID but not a name.