mojaloop / project

Repo to track product development issues for the Mojaloop project.
Other
23 stars 15 forks source link

Parsing of message body of messages in the ML Switch, JWS implications #755

Closed elnyry-sam-k closed 5 years ago

elnyry-sam-k commented 5 years ago

Summary: The body parsing in ml-api-adapter is breaking JWS! It is not respecting the body bytewise. A string which contains an escaped unicode character is being unescaped. This breaks the JWS spec.

For example Payer sends this message: {"errorInformation":{"errorCode":"5200","errorDescription":"Generic limit error, amount \u0026 payments threshold."}}

Payee receives this message: {"errorInformation":{"errorCode":"5200","errorDescription":"Generic limit error, amount & payments threshold."}} See the escaped unicode sequence in the incoming message, this is allowed as per the JSON RFC.

The issue is that the JSON RFC allows escaped unicode characters, these get unescaped by the default body parsing mechanism in ml-api-adapter (whatever body parser HAPI uses). The unescaping breaks JWS as forwarded message bodies must be byte-for-byte identical to the incoming bodies. The only way mojaloop can be sure to not break JWS is to retain the incoming byte stream for forwarding at a later point. This is a mojaloop wide issue, across the entire surface, ml-api-adapter, all the kafka handlers etc... The switch is not allowed to alter the bodies of any message it forwards on, but it does under certain circumstances, so will require a fix.

Severity: High

Priority: Critical

Expected Behavior

Steps to Reproduce Incoming message : {"errorInformation":{"errorCode":"5200","errorDescription":"Generic limit error, amount \u0026 payments threshold."}}

Outgoing message: {"errorInformation":{"errorCode":"5200","errorDescription":"Generic limit error, amount & payments threshold."}}

See the escaped unicode sequence in the incoming message, this is allowed as per the JSON RFC.

Specifications

Notes:

Tasks:

Acceptance criteria:

Pull Requests:

adrianhopebailie commented 5 years ago

Canonical encoding of JSON is a known issue. This is why all JOSE specs require payloads to be base64url encoded and the signatures computed against the raw bytes of that encoding.

My suggestion is to preserve the incoming byte stream (prob as base64 encoded string) along with the payload in all the kafka messages and use that to reconstruct the body when making the callback, rather than the "payload" kafka message property that is currently used. That would require a coordinated change to the kafka message "protocol"/structure across all the handlers and ml-api-adapter.

This is the approach taken by JOSE but it also introduces a potential security foot-gun. Implementors need to ensure that they validate the signature against the binary version of the data but ALSO ensure that the binary data matches the decoded data used for processing. Just something to be aware of.

The current Signtaure spec does 90% of JWT but then throws away the base64url encoded payload.

It puts the following header into the request:

{
  "signature":
  "dz2ntyS0_rDyA0pLeWluG--tBcYYrlvG99ffkXcEBuve5Qzvzyn0ZUi82J7h17RsdfHPuTnbEGvCeU9Y4Bg0nIZHGL4icswaaO09T5hPPYKBTzVQeHkokLmL4dXpHdr1ggSEpu3WEU3nfgOFGGAdOq355i1iGuDbhqm_lSfVHaqdVCEh
kJ2Y_r2glO2QpdZrcbvsBV39derj_PlfISBBGjdh0dIPxnFIVcZuPHiq9Ha2MslrBHfqwF
fNeU_xhErBd2PywkDQJbKOlfqdkmFC9bS8Ofx0O6Mg7qdFGwQkseJTfp0HMbH1d9e6H0cocY8xfuDNGaZpOJhxiYtiPLg", 
  "protectedHeader":
 "eyJhbGciOiJSUzI1NiIsIkZTUElPUC1EZXN0aW5hdGlvbiI6IjU2NzgiLCJGU1BJT1AtVVJJIjoiL3F1b3RlcyIsIkZTUElPUC1IVFRQLU1ldGhvZCI6IlBPU1QiLCJEYXRlIjoiVHVlLCAyMyBNYXkgMjAxNyAyMToxMjozMSBHTVQiLCJGU1BJT1AtU291cmNlIjoiMTIzNCJ9"
}

If it followed the JWT spec fully this header would simply be a JWT (i.e. the protected header, payload and signature all encoded and separated by a ..

Something like:

eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsIkZTUElPUC1EZXN0aW5hdGlvbiI6IjU2NzgiLCJGU1BJT1AtVVJJIjoiL3F1b3RlcyIsIkZTUElPUC1IVFRQLU1ldGhvZCI6IlBPU1QiLCJEYXRlIjoiVHVlLCAyMyBNYXkgMjAxNyAyMToxMjozMSBHTVQiLCJGU1BJT1AtU291cmNlIjoiMTIzNCJ9.eyJhbW91bnQiOnsiYW1vdW50IjoiMTUwIiwiY3VycmVuY3kiOiJVU0QifSwidHJhbnNhY3Rpb25UeXBlIjp7InNjZW5hcmlvIjoiVFJBTlNGRVIiLCJpbml0aWF0b3IiOiJQQVlFUiIsInN1YlNjZW5hcmlvIjoiUDJQIFRyYW5zZmVyIGFjcm9zcyBNTSBzeXN0ZW1zIiwiaW5pdGlhdG9yVHlwZSI6IkNPTlNVTUVSIn0sInRyYW5zYWN0aW9uSWQiOiIzNjYyOWE1MS0zOTNhLTRlM2MtYjM0Ny1jMmNiNTdlMWUxZmMiLCJxdW90ZUlkIjoiNTllMzMxZmEtMzQ1Zi00NTU0LWFhYzgtZmNkODgzM2Y3ZDUwIiwiZXhwaXJhdGlvbiI6IjIwMTctMDUtMjRUMDg6NDA6MDAuMDAwLTA0OjAwIiwicGF5ZXIiOnsicGVyc29uYWxJbmZvIjp7ImRhdGVPZkJpcnRoIjoiMTk4Ni0wMi0xNCIsImNvbXBsZXhOYW1lIjp7Im1pZGRsZU5hbWUiOiJCZW4iLCJMYXN0TmFtZSI6IkxlZSIsImZpcnN0TmFtZSI6IkJpbGwifX0sIm5hbWUiOiJCaWxsIExlZSIsInBhcnR5SWRJbmZvIjp7ImZzcElkIjoiMTIzNCIsInBhcnR5SWRUeXBlIjoiTVNJU0ROIiwicGFydHlTdWJJZE9yVHlwZSI6IlJlZ2lzdGVyZWRDdXN0b21lciIsInBhcnR5SWRlbnRpZmllciI6IjE2MTM1NTUxMjEyIn19LCJwYXllZSI6eyJwYXJ0eUlkSW5mbyI6eyJmc3BJZCI6IjU2NzgiLCJwYXJ0eUlkVHlwZSI6Ik1TSVNETiIsInBhcnR5SWRlbnRpZmllciI6IjE1Mjk1NTU4ODg4In19LCJmZWVzIjp7ImFtb3VudCI6IjEuNSIsImN1cnJlbmN5IjoiVVNEIn0sImV4dGVuc2lvbkxpc3QiOlt7InZhbHVlIjoidmFsdWUxIiwia2V5Ijoia2V5MSJ9LHsidmFsdWUiOiJ2YWx1ZTIiLCJrZXkiOiJrZXkyIn0seyJ2YWx1ZSI6InZhbHVlMyIsImtleSI6ImtleTMifV0sIm5vdGUiOiJ0aGlzIGlzIGEgc2FtcGxlIGZvciBQT1NUIC9xdW90ZXMiLCJnZW9Db2RlIjp7ImxvbmdpdHVkZSI6IjEyNS41MjAwMDEiLCJsYXRpdHVkZSI6IjU3LjMyMzg4OSJ9LCJhbW91bnRUeXBlIjoiUkVDRUlWRSJ9.Xfs9A2tB2XDIUqBfewt8L19y4rD3NyLFJi-y2tnOUcy9vpz8WSXW-3yQU7RiOdjmSYhcpaM6wSVssaimgCSa-ufuFc1eUQEUdB-kiEkh9-_qZHJXOKQRDRV8cz5QoIMsFnGv8sddE2jwE79qOyOblkbOsoaC-yeWb7OQV2i2AaagcpINbInuo9sUVIXzNLso-WqGBHLyZzjw9U0X57TboR076Mjj2W0k8Xk5uz7z2aLBh_IN5yEYJaEXVw0XgekTHpMmZE3R9Ji1hbwf3k63vtaYYGdXClxHTF8Jkbzkm-ESBt61AlZH2IbzuzBEGlV-VAdXFwR-V5SfOOHWVl72PA

See on JWT.io

One thing to note is that we MAY hit limits on header sizes. The HTTP spec doesn't specify limits but some servers do implement them (lowest I am aware of is 8Kb) so we're still well within that limit.

mdebarros commented 5 years ago

Currently the ML-API-Adapter parses the incoming message into a JSON Object via the Hapijs library, which is then sent to the node-rdkafka streaming library, and subsequently, each handler either decodes or re-encodes the message from kafka using the same node-rdkafka library.

Messages are being stored on Kafka based on the Lime protocol (https://github.com/mojaloop/central-services-stream/blob/master/src/kafka/protocol/readme.md).

The header & payload is stored in the content field as per the following example:

{
    "id": "f2f038cc-b749-464d-a364-c24acad58ef0",
    "to": "mockfsp02",
    "from": "mockfsp01",
    "type": "application/json",
    "content": {
        "headers": {
            "accept": "application/vnd.interoperability.transfers+json;version=1",
            "content-type": "application/vnd.interoperability.transfers+json;version=1",
            "date": "2019-01-22T21:27:55.000Z",
            "fspiop-source": "mockfsp01",
            "fspiop-destination": "mockfsp02",
            "content-length": 437
        },
        "payload": {
            "transferId": "f2f038cc-b749-464d-a364-c24acad58ef0",
            "payerFsp": "mockfsp01",
            "payeeFsp": "mockfsp02",
            "amount": {
                "amount": "1",
                "currency": "USD"
            },
            "ilpPacket": "eyJxdW90ZUlkIjoiZThmYzFlYzEtZTM5Yy00NDBjLWJjNmQtZjg1NjM2MzYwMTBiIiwidHJhbnNhY3Rpb25JZCI6ImU4ZmMxZWMxLWUzOWMtNDQwYy1iYzZkLWY4NTYzNjM2MDEwYiJ9",
            "condition": "ta5inyYarm7thvY3DPHZgSDK0ml_kuGOPi-qhiF1SCs",
            "expiration": "2019-12-10T12:07:47.349Z"
        }
    },
    "metadata": {
        "event": {
            "id": "25240fa4-da6a-4f18-8b42-e391fde70817",
            "type": "prepare",
            "action": "prepare",
            "createdAt": "2019-05-06T08:53:16.996Z",
            "state": {
                "status": "success",
                "code": 0
            }
        }
    }
}

An additional field payloadType could be added under the content field to provide the following information:

  1. content type <-- the original representation of the data stored based on Mime type
  2. transfer encoding <-- the encoding used for the message transport which is an additional parameter

payloadType = '<Content-Type>;encode=<Transfer-Encoding>'

Here is an example where the payload contains the above JSON which has been encoded as base64:

{
    "id": "f2f038cc-b749-464d-a364-c24acad58ef0",
    "to": "mockfsp02",
    "from": "mockfsp01",
    "type": "application/json",
    "content": {
        "headers": {
            "accept": "application/vnd.interoperability.transfers+json;version=1",
            "content-type": "application/vnd.interoperability.transfers+json;version=1",
            "date": "2019-01-22T21:27:55.000Z",
            "fspiop-source": "mockfsp01",
            "fspiop-destination": "mockfsp02",
            "content-length": 437
        },
        "payload": "ewogICAgICAgICAgICAidHJhbnNmZXJJZCI6ICJmMmYwMzhjYy1iNzQ5LTQ2NGQtYTM2NC1jMjRhY2FkNThlZjAiLAogICAgICAgICAgICAicGF5ZXJGc3AiOiAibW9ja2ZzcDAxIiwKICAgICAgICAgICAgInBheWVlRnNwIjogIm1vY2tmc3AwMiIsCiAgICAgICAgICAgICJhbW91bnQiOiB7CiAgICAgICAgICAgICAgICAiYW1vdW50IjogIjEiLAogICAgICAgICAgICAgICAgImN1cnJlbmN5IjogIlVTRCIKICAgICAgICAgICAgfSwKICAgICAgICAgICAgImlscFBhY2tldCI6ICJleUp4ZFc5MFpVbGtJam9pWlRobVl6RmxZekV0WlRNNVl5MDBOREJqTFdKak5tUXRaamcxTmpNMk16WXdNVEJpSWl3aWRISmhibk5oWTNScGIyNUpaQ0k2SW1VNFptTXhaV014TFdVek9XTXRORFF3WXkxaVl6WmtMV1k0TlRZek5qTTJNREV3WWlKOSIsCiAgICAgICAgICAgICJjb25kaXRpb24iOiAidGE1aW55WWFybTd0aHZZM0RQSFpnU0RLMG1sX2t1R09QaS1xaGlGMVNDcyIsCiAgICAgICAgICAgICJleHBpcmF0aW9uIjogIjIwMTktMTItMTBUMTI6MDc6NDcuMzQ5WiIKfQ==",
        "payloadType": "application/json;encode=base64"
    },
    "metadata": {
        "event": {
            "id": "25240fa4-da6a-4f18-8b42-e391fde70817",
            "type": "prepare",
            "action": "prepare",
            "createdAt": "2019-05-06T08:53:16.996Z",
            "state": {
                "status": "success",
                "code": 0
            }
        }
    }
}

payloadType examples:

Note that the performance overhead of decoding/encoding should also be taking into consideration here. I.e. UTF8 encode/decode is probably more efficient than base64 since we can take the string and directly parse it into a JSON object as required where base64 will require an additional decode.

This would ensure that the payload could then be validated against the JWS Signature as normal.

We could enhance the existing central-services-stream library to provide the capability to encode/decode the payload based on the payloadType. This would be a minimal change to the existing code-base if this solution is accepted.

Let me know what you guys think.

mdebarros commented 5 years ago

Alternatively, we could encode the complete content field, and rather move the encode parameter up to the `type field as per the following example:

{
    "id": "f2f038cc-b749-464d-a364-c24acad58ef0",
    "to": "mockfsp02",
    "from": "mockfsp01",
    "type": "application/json;encode=base64",
    "content": "ewogICAgICAgICJoZWFkZXJzIjogewogICAgICAgICAgICAiYWNjZXB0IjogImFwcGxpY2F0aW9uL3ZuZC5pbnRlcm9wZXJhYmlsaXR5LnRyYW5zZmVycytqc29uO3ZlcnNpb249MSIsCiAgICAgICAgICAgICJjb250ZW50LXR5cGUiOiAiYXBwbGljYXRpb24vdm5kLmludGVyb3BlcmFiaWxpdHkudHJhbnNmZXJzK2pzb247dmVyc2lvbj0xIiwKICAgICAgICAgICAgImRhdGUiOiAiMjAxOS0wMS0yMlQyMToyNzo1NS4wMDBaIiwKICAgICAgICAgICAgImZzcGlvcC1zb3VyY2UiOiAibW9ja2ZzcDAxIiwKICAgICAgICAgICAgImZzcGlvcC1kZXN0aW5hdGlvbiI6ICJtb2NrZnNwMDIiLAogICAgICAgICAgICAiY29udGVudC1sZW5ndGgiOiA0MzcKICAgICAgICB9LAogICAgICAgICJwYXlsb2FkIjogewogICAgICAgICAgICAidHJhbnNmZXJJZCI6ICJmMmYwMzhjYy1iNzQ5LTQ2NGQtYTM2NC1jMjRhY2FkNThlZjAiLAogICAgICAgICAgICAicGF5ZXJGc3AiOiAibW9ja2ZzcDAxIiwKICAgICAgICAgICAgInBheWVlRnNwIjogIm1vY2tmc3AwMiIsCiAgICAgICAgICAgICJhbW91bnQiOiB7CiAgICAgICAgICAgICAgICAiYW1vdW50IjogIjEiLAogICAgICAgICAgICAgICAgImN1cnJlbmN5IjogIlVTRCIKICAgICAgICAgICAgfSwKICAgICAgICAgICAgImlscFBhY2tldCI6ICJleUp4ZFc5MFpVbGtJam9pWlRobVl6RmxZekV0WlRNNVl5MDBOREJqTFdKak5tUXRaamcxTmpNMk16WXdNVEJpSWl3aWRISmhibk5oWTNScGIyNUpaQ0k2SW1VNFptTXhaV014TFdVek9XTXRORFF3WXkxaVl6WmtMV1k0TlRZek5qTTJNREV3WWlKOSIsCiAgICAgICAgICAgICJjb25kaXRpb24iOiAidGE1aW55WWFybTd0aHZZM0RQSFpnU0RLMG1sX2t1R09QaS1xaGlGMVNDcyIsCiAgICAgICAgICAgICJleHBpcmF0aW9uIjogIjIwMTktMTItMTBUMTI6MDc6NDcuMzQ5WiIKICAgICAgICB9CiAgICB9",
    "metadata": {
        "event": {
            "id": "25240fa4-da6a-4f18-8b42-e391fde70817",
            "type": "prepare",
            "action": "prepare",
            "createdAt": "2019-05-06T08:53:16.996Z",
            "state": {
                "status": "success",
                "code": 0
            }
        }
    }
}

With the content field containing the following once it has been de-encoded:

"content": {
    "header": <header as received by hapijs>,
    "payload": <raw payload as received by hapijs>
}"

I feel like my initial suggested approach is more explicit and would remove any confusion on how to handle the contents, and how the internal payload field should be handled.

adrianhopebailie commented 5 years ago

Note that the performance overhead of decoding/encoding should also be taking into consideration here

I doubt this will be an issue but even if it is I would suggest getting this working first before worrying about optimising the encoding/decoding.

In terms of the suggestions:

If we follow your second suggestion content will always be raw binary data encoded using base64. If you are just going to put binary data in the LIME envelope why not encode the whole raw HTTP request/response as the content (i.e. don't parse the headers and body and put them in a new JSON object)? My guess is that this won't be trivial using any HTTP libraries on Node and you will need to use raw TCP sockets so that may be a good reason but it seems pointless to parse the HTTP request/response only to re-encode it as a binary blob.

For that reason I prefer your first suggestion, however it would be useful to have some explicit header normalisation rules (I didn't see any in the Signature spec) to ensure that the decoded headers are still used correctly when validating the signature.

Then I would do the following:

  1. Treat all HTTP request/response bodies as binary data and base64url encode these as the value of content.
  2. Look at the content-type header to determine the MIME type of the payload rather than repeat it in a payloadType property.
mdebarros commented 5 years ago

Note that the performance overhead of decoding/encoding should also be taking into consideration here

I doubt this will be an issue but even if it is I would suggest getting this working first before worrying about optimising the encoding/decoding.

Agreed.

I prefer your first suggestion, however it would be useful to have some explicit header normalisation rules (I didn't see any in the Signature spec) to ensure that the decoded headers are still used correctly when validating the signature.

That is correct. I do not know of any header normalization rules as per the spec other then what is validated against the API specification explicitly, nor do the headers form part of the calculated signature (other than to store the signature itself). Do you have an example of how we could bridge this world?

Then I would do the following:

  1. Treat all HTTP request/response bodies as binary data and base64url encode these as the value of content.

Just to clarify, do you mean encode the entire value of the content field or each of the fields within the content separately (i.e. headers & payload)?

  1. Look at the content-type header to determine the MIME type of the payload rather than repeat it in a payloadType property.

Fair enough. If your 1st point above applies to both the headers and payload fields, then I would suggest that the name of this field is more appropriately nameed encodeType which would be applicable to both fields.

bushjames commented 5 years ago

All, I think it would be a mistake to prematurely generalise. Let us not worry about building a solution to any possible incoming MIME type. The API specifies UTF-8 encoded JSON only and that is highly unlikely to change any time soon. Let us worry about fixing the situation for the current spec and stack first.

We have a node.js front-end that can be configured to not parse incoming payloads using the default node JSON parser and simply give our code the raw bytes. So, we could stop parsing the bodies in the HAPI framework and do it explicitly in our code; that gives us more control of the parsing. It also allows us to store the bytestream intact before parsing.

All that may be academic however as @adrianhopebailie raises a very serious security concern. Storing and re-transmitting the incoming bytestream opens the door to us possibly passing on a message that is not the same as our parsed interpretation; if our parser has such a flaw then we are already very broken indeed, but at least the eventual receiver would know given our current approach of reconstructing the outgoing message as JSON from our js objects would invalidate the signature in this situation.

If my interpretation of @millerabel 's recent observations is correct, he suggests a change to the spec to eliminate the cases that cause signature breaks during payload reconstruction e.g. whitespace, ordering of lists and object properties, and unicode escaping. I suggest we give this suggestion a significant amount of credence and consideration as I believe it is the only solution that prevents a possible parser defect being exploited.

lewisdaly commented 5 years ago

It looks like we could also build our own body parser middleware if needed, or take this one: https://github.com/nasa8x/hapi-bodyparser and customize it as needed.

vgenev commented 5 years ago
elnyry-sam-k commented 5 years ago

This is now implemented and deployed on AWS. To keep it manageable, closing this out - for any new issue or regression, please log a new bug