gravitee-io / issues

Gravitee.io - API Platform - Issues
64 stars 26 forks source link

Binary data gets mangled when using URL rewriting or assign content policy #9132

Open pcgaustad opened 1 year ago

pcgaustad commented 1 year ago

Describe the bug

Binary data gets mangled when using a URL rewriting or assign content policy. This goes for whatever data the policy applies to, whether it is in the request or response. In this report I describe the bug for a response with a URL rewriting policy.

To Reproduce

  1. Create a new API with a minimal plan and a backend that can respond with binary data. Do not apply any policies.

  2. Send a request which will make the backend return some predefined binary data. In my case \x00\x00\x00\x00\xff\xff\xff\xff

  3. Save the binary data somehow, e.g. with curl --output test.bin "https://example.com"

  4. Compare the binary data in the response with the predefined data, e.g. with hexdump, and observe that it equals the predefined data.

  5. Apply a URL rewrite policy for header and body, as simple as it gets, as in the image below.

  1. Repeat step 2 and 3.

Expected behaviour

The binary data in the response equals the predefined binary data, that is \x00\x00\x00\x00\xff\xff\xff\xff.

Current behaviour

The binary data in the response is \x00\x00\x00\x00\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd. The \xff get turned into 'REPLACEMENT CHARACTER' (U+FFFD).

Useful information

I have also verified that binary data don't get mangled when the client communicates directly with the endoint.

I have not observed any problems with data in utf-8 encoding.

I have only been able to do tests with a SOAP API. As the HTTP headers seem pretty standard I assume the bug is unrelated to the protocol.

cURL verbose output:

< HTTP/2 200 < cache-control: private < content-type: multipart/related; type="application/xop+xml";start="<http://tempuri.org/0>"; boundary="uuid:807159d0-9955-43f4-9c6a-2e83d5d5a3ec+id=486";start-info="text/xml" < date: Tue, 11 Jul 2023 11:09:13 GMT < mime-version: 1.0 < server: Microsoft-IIS < set-cookie: ASP.NET_SessionId=s3g1xlx1czihcxjochdqujxi; path=/; HttpOnly; SameSite=Lax; Sec ure < x-gravitee-request-id: 8ce96063-3290-4767-a960-633290c76764 < x-gravitee-transaction-id: 8ce96063-3290-4767-a960-633290c76764 < x-content-type-options: nosniff

Response data:

Content-ID: <http://tempuri.org/0> Content-Transfer-Encoding: 8bit Content-Type: application/xop+xml;charset=utf-8;type="text/xml"

](http://schemas.xmlsoap.org/soap/envelope/%22%3E)](http://www.gecko.no/ephorte/services/documents/v3%22/%3E)\example\test.bin](http://www.gecko.no/ephorte/services/documents/v3%22%3E%5C%5Cexample%5Ctest.bin)</h:FileName></s:Header>](http://www.w3.org/2004/08/xop/include%22/%3E)</s:Body></s:Envelope>

Content-ID: <http://tempuri.org/1/638246777541059478> Content-Transfer-Encoding: binary Content-Type: application/octet-stream

[binary data]

Environment

APIM 3.15.21 (due to #9026)

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

glorytiger commented 7 months ago

This bug happens here as well for APIM 3.15.21

Created a API with URL Rewriting policy and a echo backend.

With URL Rewriting enabled:

printf '\x00\x00\x00\x00\xff\xff\xff\xff' | curl -H "Content-Type:application/octet-stream" --data-binary @- https://<URL_TO_API> --output a.bin
hexdump -C a.bin
00000000  00 00 00 00 ef bf bd ef  bf bd ef bf bd ef bf bd  |................|
00000010

With URL Rewriting disabled:

printf '\x00\x00\x00\x00\xff\xff\xff\xff' | curl -H "Content-Type:application/octet-stream" --data-binary @- https://<URL_TO_API> --output b.bin
hexdump -C b.bin
00000000  00 00 00 00 ff ff ff ff                           |........|
00000008
phiz71 commented 6 months ago

Hello @pcgaustad and @glorytiger, first I would like to apologize for this late answer.

The purpose of the URL rewriting policy is to replace URL strings in headers or in the body by an other URL. Since this replacement mechanism relies on regex, the policy assumes that the body content is a string and does not expect binary content.

If you just want to replace URLs in headers, you can unselect Rewrite HTTP response body option in the policy configuration.

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

glorytiger commented 3 months ago

Hi @phiz71 and thank you for the response.

I see that we didn't mention that this applies also if the Rewrite HTTP response body is turned off.

printf '\x00\x00\x00\x00\xff\xff\xff\xff' | curl -H "Content-Type:application/octet-stream" --data-binary @- https://gw-qa.intark.uh-it.no/cn-test/url-rewrite-binary --output a.bin && hexdump -C a.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    24    0    16  100     8     92     46 --:--:-- --:--:-- --:--:--   139
00000000  00 00 00 00 ef bf bd ef  bf bd ef bf bd ef bf bd  |................|
00000010

A

phiz71 commented 3 months ago

Hi, could you remind me the version of APIM you are using ? Have you tested with the latest versions available ? Latest 3.x is 3.20.32 Latest 4.x is 4.3.1

glorytiger commented 3 months ago

Hi, I tested with version 3.20.32 and was unable to reproduce the issue with Rewrite HTTP response body turned off. I also failed to reproduce it in the environment from yesterday (3.20.22).

This makes me believe that I might have done an error in my last test from yesterday; namely pass the curl command before the test API was updated. It seems it is working like you described in your previous answer.

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.