Support utf8 encoding - Githubissues

garronej commented 7 years ago

Feature support of UTF8 encoding

garronej commented 7 years ago

On the current relase tests fail. I fixed that :

garronej commented 7 years ago

Note: I deleted mpart01 from the test samples because this sip stack implementation does not handle binary coded payload.

teifip commented 7 years ago

I'm trying to use this SIP stack to send and receive SMS through method MESSAGE with content-type application/vnd.3gpp.sms. I see that the content gets parsed with all MSBs set to 0 (7-bit). For example, using hex representation, I receive "00170007112141070700701B01160B1121316216137700000F48321B7D0641436F761814134D00" instead of "00170007912141070700F01B01160B912131629613F700000FC8329BFD0641C36F76181493CD00". Is there a particular reason why the stack has been designed with this restriction? Thanks!

garronej commented 7 years ago

@teifip Hello,

No there is no reason, it is a design flaw. This SIP stack assume that messages content are ascii encoded buffers regardless of the conten-type header value. As a result, as soon as it has to deal with content of other types it corupt the message. I have submited this pull request so that the stack can handle text content containing non ascii characters but the stack still corupt abritrary binary body.

This test confirm that it is indeed what cause your problem:

'use strict';

let sent= "00170007912141070700F01B01160B912131629613F700000FC8329BFD0641C36F76181493CD00";
let received= "00170007112141070700701B01160B1121316216137700000F48321B7D0641436F761814134D00";

let b= new Buffer(sent, "hex");

let bAsStr= b.toString("ascii");

let b_= new Buffer(bAsStr, "ascii");

if( b_.toString("hex").toUpperCase() === received ){

        console.log("Encoding is indeed the problem here!")

}

Your options are:

Use an other SIP stack
Modify this one so it deal with content as Buffer
Send your SMS PDU as HEX string => you can not use the application/vnd.3gpp.sms content type.

Regards

teifip commented 7 years ago

@garronej Thanks a lot for your prompt and precise answer! Much appreciated.

I've started to modify the stack to get it working also with binary content. For the moment I've done it only for UDP, but the extension to the other transports should not be a problem. Unfortunately, the changes needed are a bit more than what makes sense to push to the library. Once I'm done, I'll share them with a fork. Here is a high-level summary:

The new logic assumes that message.content is passed as a Buffer when it needs to be treated as binary content;
The function stringify is modified so that it supports two mode of operations, one for preparing the messages to be sent, the other for displaying them as a string;
The exports.stringify is modified so that when called from the application the function is always used for display purposes; in this case binary content is converted to hex so that it can be nicely captured in logs;
Instead, when used from within the library, the stringify function returns a Buffer rather than a string, and all the send functions (for all the different transports) are modified to take the Buffer directly;
The parseMessage function is modified to split between header and content before converting to string; also, the function now considers content-type; if content type is one that requires handling as binary (application/vnd.3gpp.sms in my case) then the parsing returns message.content as Buffer; otherwise, it returns message.content as string.

Again, thanks.

garronej commented 7 years ago

Those changes look fine to me :)

Exept that this stack is also used as a low level library for handleing sip message, stringify is not only used for display purpose so the internal stringify should be exported as well under an other name like toBuffer or something like that.

I think that the export.stringify function should decode the content as HEX only if the content-type state that it is nesesary because in 99% of the usecase it will be text based content that can be decoded in utf8.

Otherwise this is how I would have done it !

If possible use the old syntax for buffer using "new Buffer" instead of "Buffer.from" otehrwise it will break the stack for old version of node ( c.f. raspberypi )

If you plan to support statefull connection as well be carefull the streamParser function will have to be modified quite a lot. Unlike with udp a single sip message can be arbitrarly splited among chunks of data. Therefore content-length must be whisely used. ( Not like in the current implementation where non ascii message content frequentely overlape )

Of course those are not request but advise, do what you want with them :)

Good luck!

PS: @kirm Hello, Is there any reason why my changes are not merged? This PR does not change the API nor break anything for anyone but widely expad the relyability of the stack. Regards

garronej commented 7 years ago

Hi @kirm,

No worry thanks already for reviewing.

I changed the function so it successfully parse AoR like <sip:bob@example.com;foo="bar>baz">. Meaning that it is okay to have the '>' character in a param value as long as the param value start and end by the double quote character.
I removed the mpart01 test because, as discussed with @teifip this SIP stack, by design, can't handle content that are not text based therefore this test was irrelevent.

Regargs,

kirm commented 7 years ago

Ok
Well mpart01 doesn't test the ability to handle binary contents, it test the ability to handle multipart messages - for example parser shouldn't be confused by presence of content-type headers inside content

garronej commented 7 years ago

@kirm, I see what you mean but please have a look at the input value (mpart01.dat) and compare it with the expected value ( mpart01.json ) you will understand why this test is non sense.

mpart01.dat:

mpart01.json:

See the content as present in the mpart01.json is corrupted regarding the input mpart01.dat. How could it not be as the content is not some text ?

If you want to test this stack against what you are saying a new test must be writen but it is, I think, unessesary as the content is never parsed.

Regards

kirm commented 7 years ago

Yes content is never parsed, but the was bugs in this stack that this particular test has caught. Namely it mistook \r\n separator inside the content as content/headers separator

On 1 Sep 2017, at 11:37, Garrone Joseph notifications@github.com wrote:

@kirm, I see what you mean but please have a look at the input value (mpart01.dat) and compare it with the expected value ( mpart01.json ) you will understand why this test is non sense.

mpart01.dat:

mpart01.json:

See the content as present in the mpart01.json is corrupted regarding the input mpart01.dat. How could it not be as the content is not some text ?

If you want to test this stack against what you are saying a new test must be writen but it is, I think, unessesary as the content is never parsed.

Regards

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

garronej commented 7 years ago

@kirm,

Ok, I created the test crlfincontent that aim to test the parser against what you are describing but without involving binary data.

Regars

kirm commented 7 years ago

Well, it is better to just return mpart01 to its place. What parser does to content is irrelevant in this case.

garronej commented 7 years ago

But it's the same message, I just removed the binary data.

And I think it's relevent as someone could make a valid change to the stack and this test may fail as the expected value does not describe what we realy expect.

garronej commented 7 years ago

@kirm This was definitely the best way to deal with the issue. I happily close this PR :)

@teifip krim came out with a simple solution that solve everything, you should not have to work on your fork anymore.

Regards

teifip commented 7 years ago

@garronej Thanks for your attention to my case. Yes, I confirm. The changes work perfectly with my use cases with application/vnd.3gpp.sms content.

@kirm Thanks a lot for your work. Much appreciated! Are you going to publish the modified library to npm?

kirm commented 7 years ago

Yes, but i'd like to test it a bit more before pushing to npm. So in couple of days probably

kirm / sip.js

Support utf8 encoding #120