Open SwooshyCueb opened 1 year ago
Very interesting ... and kind of obvious now that it's linked :)
Suggests the server has to hold a connection longer-lived than an individual Agent.
Hmm.....
I've been pondering this for a bit, actually. Was something I thought about working on as part of #105, but there's already so much stuff in there that I think I'd rather get a bow on it and do further refactoring of our AMQP/Proton usage as part of a separate effort.
I agree. Handling that in a separate PR feels like the right thing to do.
Instead of sending one AMQP message per RabbitMQ connection, we can store the messages and send them at a later time. This enables the iRODS server to batch messages. This also means the server is less likely to lose messages if they were written down successfully.
The audit plugin would grow two new configuration options:
shared_memory_size_in_bytes
: The size of the shared memory buffer used to hold audit information
flush_interval_in_seconds
: The amount of time before an agent attempts to send the messages stored in the shared memory bufferThe audit plugin would be changed to do the following:
shared_memory_size_in_bytes
, flush the messages.flush_interval_in_seconds
.flush_interval_in_seconds
has been satisfied, flush the messages.flush_interval_in_seconds
has not been satisfied, do nothing.Flush the messages means ... opening and using a single RabbitMQ connection to send all messages stored in shared memory. Only one agent is allowed to do this.
This solution assumes access to the shared memory is synchronized across agents running on the same machine. This solution does not require zone-wide synchronization.
Would the messages also flush when the plugin's stop
operation is invoked? Or is this being managed outside of the agents? "The server" terminology is tripping me up, I think.
Would the messages also flush when the plugin's
stop
operation is invoked?
That's a possibility. We'd need to investigate and see what the pros/cons are around the stop
operation.
Or is this being managed outside of the agents?
It is all happening inside the agent. The good thing about this design is that it's flexible. The audit plugin could simply write things down and let another tool handle sending the messages. The design presented just makes it so that no other tool is necessary.
Have to make sure we write down the new messages in the full buffer case... so...
flush_messages {
send_all_messages_in_single_connection()
set_last_flush_time()
}
# check the flush_interval_in_seconds
if enough seconds have passed:
flush_messages()
# save new message
message_saved = 0
do while message_saved != 1:
if !(save_message_to_buffer):
flush_messages()
Oh, I see. The flushing is actually managed in the shared memory in addition to the messages to flush. Mental model was missing some screws, as per usual.
I am personally not a fan of batching messages in this way. To me it feels like a janky, overcomplicated workaround for our ownership model.
Instead, I propose we have a single connection per agent, opened in start()
, reused in exec_rule()
, and closed in stop()
.
That seems reasonable and good.
And we'll still see a performance gain since an Agent can send hundreds of messages if all PEPs are configured to fire.
Will be fun to measure and compare.
We adapted the irods plugin to send messages to kafka instead of over AMQP. Now we see a pretty high rate of new connections to kafka (coming from the plugin) which decreases the performance of our kafka broker.
Is it safe to create a singleton kafka client in the plugin (so a single client per irods process instead of per agent)? And if yes, how exactly? We're lacking C++ experience, so any help is very welcome.
Using a long-lived kafka client is a best practice, just like AMQP connections and the kafka client already takes care of message housekeeping like buffering and flushing.
If I'm understanding correctly, you're asking if a single kafka client can be per iRODS server rather than per agent. Does that sound correct?
If yes, then iRODS doesn't provide any mechanism for doing that, yet. There are ways around that limitation though. Without going into detail, here is one way to deal with it.
With that said, both of those solutions could prove challenging depending on your C++ experience.
If I'm understanding correctly, you're asking if a single kafka client can be per iRODS server rather than per agent. Does that sound correct?
That's correct and pretty important since the irods server spawns a lot of agents (mainly due to Globus transfers creating a connection for each file to be transferred, if I'm not mistaken). We already implemented your second suggestion (that is created kafka client in plugin.start and destoy in plugin.stop). But that only partly solves our issue.
We'll go for a proxy solution.
Hi @wijnanjo, would love to know where you are doing this work. Please consider making contact at info@irods.org if mentioning here is not desired. Thanks.
new 5.0 process model
or
a completely separate binary/service (k8s?)
very interested in any performance / throughput numbers with today's various solutions
At present, we make a new connection for every AMQP message we send. This is not ideal.
From the RabbitMQ documentation (emphasis theirs):
This comes from AMQP 0-9-1-centric and .NET-centric documentation, but it still applies for us.