quickfix-j / quickfixj

QuickFIX/J is a full featured messaging engine for the FIX protocol. - This is the official project repository.
http://www.quickfixj.org
Other
955 stars 611 forks source link

During a resend-request the request for messages from storage allows for unbounded memory usage #639

Open philipwhiuk opened 1 year ago

philipwhiuk commented 1 year ago

When a resend request to Infinity (or even just a large capped number) is made, this can cause a huge number of messages to be fetched from storage into memory which can cause the application to crash due to lack of memory. It's not practical to mitigate this in the storage layer

In addition even if they fit in memory without a fix to #271 this can lead to some fairly ugly behaviour where we continue to attempt to send messages to a session that's already disconnected us.

To Reproduce

Expected behavior We should fetch the messages in batches and then send them a batch at a time

System information: N/A

Additional context I've got a hot fix for some of this - will try to tidy it up and submit for review.

This is probably additionally key if we implement #621

chrjohn commented 1 year ago

As you noted on https://github.com/quickfix-j/quickfixj/issues/621#issuecomment-1572077072 probably it would be a good thing to have a config option to restrict the maximum number of messages that could be resent. The beginning of the range could be skipped over by setting the SequenceReset tag NewSeqNo accordingly.

philipwhiuk commented 1 year ago

Yeah, a maximum resend request is probably also a good idea. I'd suggest that there's several things you could do:

  1. Logout the counter-party (the session is an invalid state, just like we do when the sequence number is too high
  2. Cap the resend amount to the limit and gap-fill
  3. No limit, use the batching

At some point I also need to look a throttling send - when we fixed our batching (and we really did want to send them all... we effectively DDoSed our counter-party until they caught up - ideally we could have throttled it so they stayed connected).

wajncn commented 5 months ago

https://github.com/quickfix-j/quickfixj/issues/778 Could solve your problem

chrjohn commented 2 months ago

@wajncn actually there was already a PR created by @philipwhiuk which should solve this: #643 Does this also work for you?