quickfixgo / quickfix

The Go FIX Protocol Library :rocket:
https://www.quickfixgo.org/
Other
731 stars 287 forks source link

ParseMessage messes up order of repeating group tags #568

Closed varunbpatil closed 4 months ago

varunbpatil commented 1 year ago

Steps to reproduce:

  1. Create a NOS message with a party ID repeating group.
  2. Convert it into bytes by calling msgBytes := nos.ToMessage().Bytes().
  3. Parse the bytes back into a quickfix message using qfMsg := quickfix.ParseMessage(msgBytes).
  4. Print the bytes from the parsed quickfix message using qfMsg.Bytes(). This should be equal to msgBytes which it is because qfMsg.Bytes() simply returns the rawMessage field.
  5. Now call qfMsg.Build() - I've made the build() method public to show the bug that happens when I actually try to send qfMsg to the counterparty.
  6. You will see that the order of tags in the party ID repeating group as a result of calling qfMsg.Build() is wrong.


Here is the code to reproduce the above issue (assuming you point to a quickfix code which has the build() method made public).

package main

import (
    "bytes"
    "fmt"
    "log"
    "time"

    "github.com/quickfixgo/enum"
    "github.com/quickfixgo/field"
    fix44nos "github.com/quickfixgo/fix44/newordersingle"
    "github.com/quickfixgo/quickfix"
    _ "github.com/quickfixgo/quickfix"
)

func main() {
    nos := fix44nos.New(
        field.NewClOrdID("1234"),
        field.NewSide(enum.Side_BUY),
        field.NewTransactTime(time.Now()),
        field.NewOrdType(enum.OrdType_LIMIT),
    )
    partyIDsgrp := fix44nos.NewNoPartyIDsRepeatingGroup()
    partyIDs := partyIDsgrp.Add()
    partyIDs.SetPartyID("party-in-da-house")
    partyIDs.SetPartyRole(enum.PartyRole_CLIENT_ID)
    nos.SetNoPartyIDs(partyIDsgrp)

    qfMsg := quickfix.NewMessage()
    if err := quickfix.ParseMessage(qfMsg, bytes.NewBuffer(nos.ToMessage().Bytes())); err != nil {
        log.Fatal(err)
    }

    // These two should be equal, but they're not. I've made qfMsg.Build() public for this example,
    // to show what message is being sent to the counterparty. The second print shows messed up
    // order of the party ID repeating group.
    fmt.Println(string(qfMsg.Bytes()))
    fmt.Println(string(qfMsg.Build()))
}


Here is the output of running the above code.

8=FIX.4.49=8235=D11=123440=254=160=20230622-08:23:27.847453=1448=party-in-da-house452=310=075
8=FIX.4.49=8235=D11=123440=254=160=20230622-08:23:27.847448=party-in-da-house452=3453=110=075

You will see that in the second print statement the party ID repeating group tags are (448, 452, 453), but it should have been (453, 458, 452).

varunbpatil commented 1 year ago

The problem is on this line. https://github.com/quickfixgo/quickfix/blob/098031ef53795b394a9e33a76cf77b96966c1641/message.go#L415

write() sorts the tags in the body without taking into account repeating groups. https://github.com/quickfixgo/quickfix/blob/098031ef53795b394a9e33a76cf77b96966c1641/field_map.go#L317-L326

steelkorbin commented 1 year ago

Good catch, the arbitrary use of sort() in the sortedTags is not FIX protocol compliant and removes a key advantage of using the FIX protocol for integrations. That being said, the sort here is just a hard coded hidden rule based on an opinion matching checks imposed by quickfix downstream from here and might be an oversight or lazy man's "quick fix" avoiding the work to validate correctly. The real issue is what sortedTags more than likely should be doing and how quickfix is not supporting an aspect of the FIX protocol, instead just opting to take a heavy hand and sort everything on write. What should be happening here is the sortedTags should be enforcing the repeating group sort order that the server expects in the agreed repeating group template. During integrations of a client to a server the client will format the repeating groups according to the tag order imposed by the server, not just numerical order by default. This test also exposes an issue with enforcing arbitrary sort order in the header, body, or trailer that might cause message failure. Example: many servers expect headers to have 8,9, then 35 as the first three with no preference for the remaining header tags, or when looking at the body being numerically sorted might land a non repeating group tag inside a repeating group entry destroying it, etc... This issue has been overlooked, maybe by design for work effort or opinion,but it is a hard coded rule opposite of the FIX protocol on this point that needs to be reviewed for correction. I think the other reason this has been overlooked is because integrations go through a process of alignment until the messages flow and agree, so we move on none the wiser. We do this assuming that during our work our implementation is what needs to change or be corrected to get things talking, especially on first use, because normally it is our fault things are not working.

varunbpatil commented 1 year ago

Hi @steelkorbin , I couldn't agree more. Totally understand how this might have slipped through. I mean, if I hadn't called ParseMessage, but instead sent the NOS message directly after it was constructed using fix44nos.New(), I wouldn't have encountered this problem at all. But, I had a special situation where the process that constructed the message was different from the quickfix initiator process and so had to transmit the message as bytes over the network to the initiator in order to send it to the counterparty. The other aspect that made this bug particularly difficult was that, in the quickfix.Send() code path, we log the message that we're sending by calling ToApp() before build() is called and unfortunately the output of ToApp() is perfectly fine. So, I was very perplexed when I got back a business message reject from the counterparty. I should have checked the messages table in the database itself.

segfault commented 9 months ago

I'm seeing this type of behavior badly break processing of Fenics and Bloomberg FIX messages. Leading to missing groups or out of order groups that don't validate.

mrefky commented 7 months ago

Thanks for adding the reference to my reported bug. I did not do any changes in the received message and just sent it back to the sender using the same connection and the repeating group tags were missed up.

dbergamin commented 4 months ago

@varunbpatil -- is the issue I reported in #622 the same underlying cause as this? I think you've dug a little more. Encountered it in Bloomberg messages as @segfault suggested.

ackleymi commented 4 months ago

@varunbpatil I believe this is resolved by changes in https://github.com/quickfixgo/quickfix/releases/tag/v0.9.1 and https://github.com/quickfixgo/quickfix/releases/tag/v0.9.2 . Try 0.9.2 and please re-open issue if that is not the case.

mrefky commented 4 months ago

Same thing using github.com/quickfixgo/quickfix v0.9.2 Field NoTradingSessions must preceed Field TradingSessionID;

varunbpatil commented 4 months ago

@ackleymi This has not been fixed. Please re-open the issue. The root cause and the minimum reproducible code has been provided above. I don't think issues should be blindly closed just because you think it may have been fixed.

varunbpatil commented 4 months ago

To any future readers that stumble upon this issue, this is the fix I've been using - https://github.com/quickfixgo/quickfix/commit/b6f77fdd80bbaa7057a90ddbb661c4af652be99e

mrefky commented 4 months ago

Thanks [varunbpatil] I confirm that the above works as expected and resolves the issue.