Open jonathan-d-zhang opened 11 months ago
Minimum project spec for rabbitMQ integration:
Rationale: The premise is to keep work off client nodes as much as possible. Sending duplicate information to the client and relying on the server side deduplication represents an enormous amount of wasted compute. This deduplication must occur prior to a client ever interfacing with a job, and an effort should be made to address a robust number of edge cases.
Furthermore:
Some thoughts on authentication - We can use RabbitMQ's built in Authentication, Authorization, and Access Control feature to provision each client with a username/password combination.
All clients should be restricted to basic.publish
on the results queue, and basic.consume
on the incoming jobs queue.
Mainframe should also have provisioned credentials with only basic.consume
on the results queue
Loader should have basic.publish
on the incoming jobs queue
On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API? I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API
On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API? I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API
Having a return queue likely helps alleviate situations where many clients are scanning many packages (and the API cannot keep up) but I'm not sure that's a goal worth aspiring to right now. It's definitely an effective future-proof scenario though.
The intention when we discussed this was that clients would POST directly to the API anyway. Clients bouncing a single request for rules off the API itself wasn't something I had really considered-- typically these rules are queued with the current ruleset SHA correct? I don't see that being an issue, but I'd be a little concerned that if we ever spin up multiple clients, and they're moving through packages quickly, we would be making potentially hundreds of requests to this endpoint per second. Not that it's serving that much, and I don't anticipate it would be an issue, but it bares mentioning nonetheless. If we put it behind any sort of rate limiting, we'll likely footgun ourselves in that regard.
Tracking issue for issues for implementing message queue