nats-io / natscli

The NATS Command Line Interface
Apache License 2.0
453 stars 94 forks source link

`micro list` timeout when micro service process paused via debugger #1065

Closed Codebreaker101 closed 1 month ago

Codebreaker101 commented 1 month ago

Observed behavior

When debugging micro service and a breakpoint is reached, nats micro list returns No results received after a 5 second timeout. This does not occur every time when running steps to reproduce. 90% of the time would be more apt. In other 10% the command returns nats: error: no micro instances found immediately. After 8 minutes of having a program stopped on a breakpoint the timeouts stop and the result of nats micro list returns to being nats: error: no micro instances found I've observed that when multiple services are running (same of different) in a different process nats micro list returns the list of services immediately without timeout, without listing the one that is paused on a breakpoint.

This feels like there is an inconsistency when handling communication with services.

0 services - immediate return with no instances found
1 blocked service  - 5s timeout with no result received 
1 blocked service, 1 working service - immediate return with 1 working instance found

This might not be that big of an issue (or it might be working as expected) but it might be a simptom of a yet undiscovered issue, hence the reason for making this issue.

Expected behavior

I would think that one of these two result to different scenarios would be more consistent as opposed to the current results especially since the timeout does no occur every time when running the steps to reproduce:

Option A 0 services - immediate return with no instances found 1 blocked service - immediate return with no instances found 1 blocked service, 1 working service - immediate return with 1 working instance found

Option B 0 services - immediate return with no instances found 1 blocked service - 5s timeout with no result received 1 blocked service, 1 working service - 5s timeout with return of 1 working instance found

Server and client version

nats server - v2.10.16 nats cli client - v0.1.4 nats go client - v1.34.1

Host environment

PopOS 22.04 LTS VSCodium 1.89.1

Steps to reproduce

Start nats server: docker run -p 4222:4222 nats:alpine Start watching for list of micro services watch -n 1 nats --server=nats://127.0.0.1:4222 micro list Paste this code into a test file:

func TestBlockNatsMicro(t *testing.T) {
    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        t.Fatal(err)
    }

    _, err = micro.AddService(nc, micro.Config{
        Name:    "test",
        Version: "0.0.2",
        Endpoint: &micro.EndpointConfig{
            Subject: "test",
            Handler: micro.HandlerFunc(func(r micro.Request) {}),
        },
    })
    if err != nil {
        t.Fatal(err)
    }

    // at this point service should be visible in the console 
    time.Sleep(time.Second*5)
    // when reached this breakpoint the `nats micro list` returns 
    // "No results received" after a 5 second timeout
    fmt.Println("breakpoint")
    // at this point service should be again visible in the console 
    time.Sleep(time.Second*5)
}

Add a breakpoint on line fmt.Println("breakpoint") Run the test with debugger Observe the results

NOTE that if the result of the watched nats command does not return No results received, restart the debugger since the issue does not always occur..

ripienaar commented 1 month ago

If the service is suspended but the NATS connection have not timed out the server still expects the client to be there (so no “no responders” error).

I don’t think there is anything to be done on this the CLi has no way of knowing what is going on - server says sojeone is there so we wait.

Only remedy is to increase ping frequencies so the stake client is detected earlier

Codebreaker101 commented 1 month ago

Shouldn't the same timeout occur when there are multiple different services running and one is suspended? Currently the timeout only occurs when these is one service running and it is suspended.

ripienaar commented 1 month ago

when there are responses It gets what response it gets, waits 300ish ms after the last one and shows you what it got.

however the very first response it will wait for the full configured timeout which is the most reliable way to do it. While your app is suspended its no different from a machine with some packet loss or similar unreliable connection so we wait.

Codebreaker101 commented 1 month ago

That clears everything up!