shabiel / M-Web-Server

A YottaDB and Caché compatible HTTP server
Apache License 2.0
24 stars 19 forks source link

Non UTF-8 byte sequences in body causes crashes #55

Closed jensli closed 2 years ago

jensli commented 2 years ago

When running GTM in UTF-8 mode we experience crashes when HTTP request and response bodies contain non UTF-8 byte sequences.

This is because in UTF-8 mode the string manipulation routines, for example $length and $extract consider such byte sequences to be invalid and generate errors.

For GTM the solution is to use for example $zlength and $zextract instead. I will submit a pull request with these changes for GTM. I'm not sure if these changes mess up the Cache support however, so I don't know if then will be useable. But maybe they can serve as a starting point.

Example:

GTM>w $l($zc(255))
%GTM-E-BADCHAR, $ZCHAR(255) is not a valid character in the UTF-8 encoding form

GTM>w $zl($zc(255))
1

Environment

M Web Server version: 1.1.2

GTM>w $zversion
GT.M V7.0-000 Linux x86_64
GTM>w $zchset
UTF-8
shabiel commented 2 years ago

@jensli Letting you know that I acknowledge receiving this issue.

This issue is a bit weird, because if you look at line 175, we are specifically expecting to be in M mode at the point this communication happens. Did you try another version of GT.M or YottaDB to see if there is a regression in GT.M?

175  X:%WOS="GT.M" "U %WTCP:(delim=$C(13,10):chset=""M"")" ; VEN/SMH - GT.M Delimiters

As I mentioned to you before, if you are customers, I can spend work time on the issue so that I can get it resolved. For now, I can tell you that your fix is good enough for what you need to do.

shabiel commented 2 years ago

And my apologies... let me thank you for your efforts to tell me about the issues. They will get fixed in due time.

jensli commented 2 years ago

This issue is a bit weird, because if you look at line 175, we are specifically expecting to be in M mode at the point this communication happens. Did you try another version of GT.M or YottaDB to see if there is a regression in GT.M?

175  X:%WOS="GT.M" "U %WTCP:(delim=$C(13,10):chset=""M"")" ; VEN/SMH - GT.M Delimiters

I'm having trouble finding exactly what the device parameter chset=""M"" does in the documentation, but I guess it works like this: It sets the expected character encoding for the input data. Bytes are read from the device to a string, without any checks or transformations. One byte in the input is written as one byte in the resulting string. Later, when the resulting string is passed to $length, the invalid UTP-8 byte sequence is detected and the error is generated.

I basically just know that I got %GTM-E-BADCHAR, and when I switched from $l to $zl then it works.

I have only tested with GTM 7.0.

I have made the changes locally in our product, and that solves the problem for us.

Some further observations:

jensli commented 2 years ago

As I mentioned to you before, if you are customers, I can spend work time on the issue so that I can get it resolved.

We have had an initial discussion with Bhaskar about a support contract. Hopefully, in the autumn we will have time to move forward with that.

Also, we have managed to fix all problems locally, so this is not a blocker and urgent for us. I mostly report to try to help the project a little. :)

shabiel commented 2 years ago

Fixed in commit 4f6107a.