File System Transport - Fails if dispatch queue directory doesn't exist.

dazinator commented 1 year ago

Consider two docker containers set to use the file system transport, both have the same volume mounted.

When container A initialises, it creates own input queue as a directory on the volume. After start up it may try to publish a message to Container B. If container B has not yet started up yet and so hasn't created its own input Q, the publishing will fail with an exception saying that the directory doesn't exist.

Rebus.Retry.ErrorTracking.InMemErrorTracker[0] 2023-09-06 13:23:09 Unhandled exception 1 while handling message with ID "c96e6304-cbe7-45eb-834b-82eb9c537209" 2023-09-06 13:23:09 System.IO.DirectoryNotFoundException: Could not find a part of the path '/root/.config/foo/agent/20230906-122309-6a6085f561c24c9899ad4151ab6d9c12-000000.rebusmessage.json'. 2023-09-06 13:23:09 at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func2 errorRewriter) 2023-09-06 13:23:09 at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode) 2023-09-06 13:23:09 at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) 2023-09-06 13:23:09 at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) 2023-09-06 13:23:09 at Rebus.Transport.FileSystem.FileSystemTransport.SendOutgoingMessages(IEnumerable1 outgoingMessages, ITransactionContext context) 2023-09-06 13:23:09 at Rebus.Transport.AbstractRebusTransport.<>c__DisplayClass3_1.<b__1>d.MoveNext() 2023-09-06 13:23:09 --- End of stack trace from previous location --- 2023-09-06 13:23:09 at Rebus.Transport.TransactionContext.InvokeAsync(Func2 actions) 2023-09-06 13:23:09 at Rebus.Transport.TransactionContext.Commit() 2023-09-06 13:23:09 at Rebus.Retry.Simple.DefaultRetryStep.Process(IncomingStepContext context, Func1 next)

Consider whether it would be better to create the directory if it doesn't exist, so the message is not lost, then when container B starts it can pick up the message.

mookid8000 commented 1 year ago

Consider whether it would be better to create the directory if it doesn't exist, so the message is not lost, then when container B starts it can pick up the message.

While I can understand how that would be convenient in some cases, that's not how Rebus works with any other transport (at least when possible), so you should normally ensure that either (a) Rebus instances are started in order, bottom-up considering their dependencies, or (b) or "queues" (whatever that means for your choice of transport) are created manually somehow before starting up.

I hope that makes sense to you 🙂

dazinator commented 1 year ago

Ok no problem. I only encountered this in the edge case of a running a docker compose setup, where I decided to debug one container without the other running, and use a new volume where the other hadn't created its queue folder yet. In the typically case this doesn't happen either because the container had run atleast once in the past and so its input queue folder exists.

I am next going to try using the file system transport on a cluster, where each container can be on a different machine, but I will be mounting the same file share for them to use (azure file share). Is there anything explicit that makes the file system transport unsuitable for small scale production workloads? Or perhaps not suitable for use with an Azure file share? Typically dealing with 10's to 100's of messages per day? I want to avoid the cost of standard tier service bus, and the overhead of hosting / managing additional databases.

mookid8000 commented 1 year ago

Is there anything explicit that makes the file system transport unsuitable for small scale production workloads? Or perhaps not suitable for use with an Azure file share?

Yes there is, actually. 😅 First off, file locking seems to work differently (or rather: not work as it should to support Rebus' needs) when running under Linux, so you should probably not do that.

Since this is the case under Linux, I would also be very afraid that any kind of shared file system (SAN, whatever) would also not provide the necessary locking for multiple processes running on multiple hosts would be able to lock files being processed as necessary.

If you can settle for only running one Rebus instance with a parallelism of 1 for each input queue, then I believe everything should work fine (even on Linux / SAN / ...) though, because then everything should behave normally even if file locks don't work as they do on Windows.

But with that said, I STRONGLY recommend that you provision an Azure Service Bus instance and use that instead! It's a real queueing system, and as such it is designed for what Rebus is going to use it for – and on the Standard tier, it is really inexpensive when dealing with so few messages! With e.g 1000 messages per day I'd expect your monthly bill to be < $20.

dazinator commented 1 year ago

Ah Ok! I will investigate pricing, I had assumed it was much more expensive ;-) - Thank you.

dazinator commented 1 year ago

Tried it out. The message appeared to be handled multiple times so I guess this is the lock mechanism not working like you mentioned.

mookid8000 commented 1 year ago

(...) so I guess this is the lock mechanism not working like you mentioned.

😅 yeah, sorry about that! But again, I can really recommend a real queueing system, especially when it's as inexpensive as Azure Service Bus. There's of course also the Azure Storage Queues option, which I believe could cost you even less, especially if you don't have that many Rebus instances.

rebus-org / Rebus

File System Transport - Fails if dispatch queue directory doesn't exist. #1114