microsoft / garnet

Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.
https://microsoft.github.io/garnet/
MIT License
10.37k stars 525 forks source link

Publish subscription timeout bug #776

Closed slqd3781 closed 1 week ago

slqd3781 commented 2 weeks ago

Describe the bug

After using the publish subscribe feature, the Garnet service will encounter this exception

ProcessMessages threw an exception: System.Threading.LockRecursionException: The calling thread already holds the lock. at System.Threading.SpinLock.ContinueTryEnter(Int32 millisecondsTimeout, Boolean& lockTaken) at Garnet.server.RespServerSession.Publish(Byte& keyPtr, Int32 keyLength, Byte& valPtr, Int32 valLength, Byte& inputPtr, Int32 sid) at Garnet.server.SubscribeBroker`3.Broadcast(Byte[] key, Byte valPtr, Int32 valLength, Boolean ascii) at Garnet.server.RespServerSession.NetworkPUBLISH() at Garnet.server.RespServerSession.ProcessMessages() at Garnet.server.RespServerSession.TryConsumeMessages(Byte* reqBuffer, Int32 bytesReceived)

Steps to reproduce the bug

used StackExchange.Redis

Expected behavior

No response

Screenshots

No response

Release version

v1.0.35

IDE

No response

OS version

No response

Additional context

No response

Vijay-Nirmal commented 2 weeks ago

Thanks for reporting it. Can you provide more details? We are using StackExchange.Redis as part of integration testing in Garnet. Pub/Sub works as expected there. Can you provide a code snippet? Does it happen when creating 2 or more subscriber instances?

slqd3781 commented 2 weeks ago

Image

I checked the source code and only moved this locking statement to trycatch, so there were no more timeouts. After the exception occurred, this channel was completely locked, but other channels still worked normally

slqd3781 commented 2 weeks ago

[FTL] [Session] ProcessMessages threw an exception: System.ObjectDisposedException: Cannot access a disposed object. Object name: 'GarnetTcpNetworkSender'. at Garnet.common.GarnetTcpNetworkSender.ThrowDisposed() at Garnet.common.GarnetTcpNetworkSender.EnterAndGetResponseObject(Byte& head, Byte& tail) at Garnet.server.RespServerSession.Publish(Byte& keyPtr, Int32 keyLength, Byte& valPtr, Int32 valLength, Byte& inputPtr, Int32 sid) at Garnet.server.SubscribeBroker`3.Broadcast(Byte[] key, Byte valPtr, Int32 valLength, Boolean ascii) at Garnet.server.RespServerSession.NetworkPUBLISH() at Garnet.server.RespServerSession.ProcessMessages() at Garnet.server.RespServerSession.TryConsumeMessages(Byte* reqBuffer, Int32 bytesReceived)

After changing it to the one shown in the picture, another error occurred

badrishc commented 1 week ago

After changing it to the one shown in the picture, another error occurred

what does this mean? did that change (moving inside try) not fix the issue? What error occurred?

The linked PR makes the change of moving it into try. Let us know if this is sufficient or not. If not, then more information and call stack would be needed, thank you.