hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.41k stars 4.43k forks source link

Provide clearer warning when Name is used when ID is needed instead #10861

Open jkirschner-hashicorp opened 3 years ago

jkirschner-hashicorp commented 3 years ago

Feature Description

Some entities within Consul have an ID and a Name (e.g., services). When the CLI or an API endpoint for such an entity specifically requires an ID, and will not work with a Name, Consul should provide an error message that makes this mistake clear to the user.

Example taken from #3122: let's say a user has a service with ID web-service-id and Name web-service-name. If the user executes /v1/agent/service/deregister/web-service-name, the following is output to the log:

2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "web-service-name": Service does not exist

That error message is very misleading, as a service with that name does exist... just not a service with that id. The user might reasonably conclude something is wrong with the Consul catalog / state, when the actual problem is that the wrong field was used in the API call. Perhaps if Consul fails to lookup a service with that id, it could lookup whether any services exist with that name and, if some do, output a clearer warning message... such as:

2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "web-service-name": a service ID must be used to deregister a service, not a service name.

This description will be updated to include relevant cases in the list below.

API:

Use Case(s)

Ease troubleshooting of any CLI or API endpoints that accept an ID but not a name.

jkirschner-hashicorp commented 3 years ago

@dnephin : I'm not actually sure this is accurate

PUT /catalog/deregister: requires node ID, not name

The docs for deregister do say that the node argument must be "the ID of the node", but I think the code suggests that it's the node name that is used, not node ID. Node name is supposed to be unique within the cluster, so that seems reasonable?

Q: What's your read on whether node name or node ID is used for /catalog/deregister? I can update the docs if needed.

Below is my attempt to trace through the code, though I feel like I went wrong somewhere... (because I ended up in a services table, not a node table)


dnephin commented 3 years ago

Sorry, just catching up on this now. I think the second line you link (state.NodeService) is only when ServiceID != "", and for a node deregister that should be the empty string, so we'd skip that line.

The first place I see the Node field being referenced is in checking the ACL permissions here: https://github.com/hashicorp/consul/blob/01e974046725f235ee954ca121dd08b2e975bb6e/agent/consul/catalog_endpoint.go#L403-L405

Our acl docs for node rule say it is expecting a node name, and I see other callers passing a node name, so definitely the ACL authorization appears to expect node name (not ID).

After that it goes into raftApply, which ends up in the FSM here: https://github.com/hashicorp/consul/blob/01e974046725f235ee954ca121dd08b2e975bb6e/agent/consul/fsm/commands_oss.go#L172

And into the state store here: https://github.com/hashicorp/consul/blob/01e974046725f235ee954ca121dd08b2e975bb6e/agent/consul/state/catalog.go#L546-L548

This query for tableNodes, indexID is matching the structs.Node.Node field, which is indeed the name.

So it does seem our docs are wrong, this API uses node name, not node ID.

jkirschner-hashicorp commented 2 years ago

Regarding:

GET /agent/service/:service_id: requires service ID, not name

The current error message is "Unknown service ID: ". This makes it clear that an ID is needed, not name.