nats-io / nats.net.v1

The official C# Client for NATS
Apache License 2.0
646 stars 154 forks source link

NATSNoRespondersException after upgrading to 1.0.5 (from 1.0.4) when watching large KV buckets #794

Closed jlumsden-mts closed 1 year ago

jlumsden-mts commented 1 year ago

Defect

Versions of NATS.Client and nats-server: NATS.Client 1.0.5 (works in 1.0.4) with server 2.9.19

OS/Container environment: Windows

Steps or code to reproduce the issue:

Extend TestKeyValue.cs with:

[Fact]
public void TestWatchManyKeys()
{
    const int NUM_MESSAGES = 1000;

    Context.RunInJsServer(c =>
    {
        // get the kv management context
        IKeyValueManagement kvm = c.CreateKeyValueManagementContext();

        // create the bucket
        kvm.Create(KeyValueConfiguration.Builder()
            .WithName(BUCKET)
            .WithMaxHistoryPerKey(10)
            .WithStorageType(StorageType.Memory)
            .Build());

        IKeyValue kvContext = c.CreateKeyValueContext(BUCKET);

        for (int i = 0; i < NUM_MESSAGES; i++)
        {
            kvContext.Put(i.ToString(), i.ToString());
        }

        TestKeyValueWatcher watcher = new TestKeyValueWatcher(true);

        var sub = kvContext.Watch(">", watcher, watcher.WatchOptions);

        int count = 0;
        while (watcher.EndOfDataReceived == 0 && count < 100)
        {
            Thread.Sleep(10);
            count++;
        }

        Assert.True(watcher.EndOfDataReceived > 0);

        sub.Unsubscribe();
    });
}

Expected result:

Test should pass in 1.0.4 and 1.0.5

Actual result:

Test passes in 1.0.4 Test fails in 1.0.5 and later: Watch method call throws NATSNoRespondersException

Initially I thought this was due to my real app watching multiple buckets but it is reproducible by adding lots of keys into a single bucket. I'm assuming some kind of timeout is occurring when it takes too long to reach end of data? If you reduce NUM_MESSAGES to 100 it will pass. I don't think I have >100 keys in my real app but the values will be much larger than this example so it appears to be message size dependent.

jlumsden-mts commented 1 year ago

Narrowed it down to passing before and failing after this commit: ea5f4b29e2e24791188c7a11fc6ea11b3cfb5f5e

scottf commented 1 year ago
No responders are available for the request.
   at NATS.Client.Connection.RequestSyncImpl(String subject, MsgHeader headers, Byte[] data, Int32 offset, Nullable`1 count, Int32 timeout) in C:\nats\nats.net\src\NATS.Client\Connection.cs:line 2961
   at NATS.Client.Connection.Request(String subject, Byte[] data, Int32 timeout) in C:\nats\nats.net\src\NATS.Client\Connection.cs:line 3048
   at NATS.Client.JetStream.JetStreamBase.RequestResponseRequired(String subject, Byte[] bytes, Int32 timeout) in C:\nats\nats.net\src\NATS.Client\JetStream\JetStreamBase.cs:line 164
   at NATS.Client.JetStream.JetStreamBase.GetConsumerInfoInternal(String streamName, String consumer) in C:\nats\nats.net\src\NATS.Client\JetStream\JetStreamBase.cs:line 65
   at NATS.Client.JetStream.JetStream.LookupConsumerInfo(String lookupStream, String lookupConsumer) in C:\nats\nats.net\src\NATS.Client\JetStream\JetStream.cs:line 449
   at NATS.Client.JetStream.JetStreamPushAsyncSubscription.GetConsumerInformation() in C:\nats\nats.net\src\NATS.Client\JetStream\JetStreamPushAsyncSubscription.cs:line 45
   at NATS.Client.KeyValue.KeyValueWatchSubscription..ctor(KeyValue kv, String keyPattern, IKeyValueWatcher watcher, KeyValueWatchOption[] watchOptions) in C:\nats\nats.net\src\NATS.Client\KeyValue\KeyValueWatchSubscription.cs:line 83
   at NATS.Client.KeyValue.KeyValue.Watch(String key, IKeyValueWatcher watcher, KeyValueWatchOption[] watchOptions) in C:\nats\nats.net\src\NATS.Client\KeyValue\KeyValue.cs:line 156
   at IntegrationTests.TestKeyValue.<>c.<TestWatchManyKeys>b__30_0(IConnection c) in C:\nats\nats.net\src\Tests\IntegrationTests\TestKeyValue.cs:line 1259
   at IntegrationTests.SuiteContext.RunInJsServer(TestServerInfo testServerInfo, Action`1 test) in C:\nats\nats.net\src\Tests\IntegrationTests\TestSuite.cs:line 124
   at IntegrationTests.KeyValueSuiteContext.RunInJsServer(Action`1 test) in C:\nats\nats.net\src\Tests\IntegrationTests\TestSuite.cs:line 423
   at IntegrationTests.TestKeyValue.TestWatchManyKeys() in C:\nats\nats.net\src\Tests\IntegrationTests\TestKeyValue.cs:line 1238
scottf commented 1 year ago

I think I figured it out. PR coming soon.

scottf commented 1 year ago

@jlumsden-mts Thank for you taking the time to document this. I flat out missed something. It's fixed now. Fixed in https://github.com/nats-io/nats.net/pull/795

jlumsden-mts commented 1 year ago

No problem @scottf, thanks for sorting it out so quickly