Open durban opened 8 years ago
That's a known behavior. Is there some problem with it?
Yes. For example MessageFactoryImpl.createRequest
refuses to parse a perfectly legal String
, like this (if the platform encoding is, e.g., US-ASCII):
PUBLISH sip:bob@biloxi.example.com SIP/2.0
Call-ID: 35516046df6aa32736ef49c28e98de93@127.0.0.1
CSeq: 1 PUBLISH
From: "Alice" <sip:alice@atlanta.example.com>;tag=9fxced76sl
To: "Bob" <sip:bob@biloxi.example.com>
Max-Forwards: 70
Content-Type: text/html
Content-Length: 4
éű
The exception is like this:
java.text.ParseException: Invalid content length 2 / 4
at gov.nist.javax.sip.message.SIPMessage.setMessageContent(SIPMessage.java:1373)
at gov.nist.javax.sip.parser.StringMsgParser.parseSIPMessage(StringMsgParser.java:204)
at gov.nist.javax.sip.message.MessageFactoryImpl.createRequest(MessageFactoryImpl.java:736)
...
(I guess the content length mismatch is due to the 2-byte characters being replaced by 1-byte question marks.)
Sure but this is expected since we have no way of knowing the correct encoding. If we hardcode something, when other encoding comes up it will fail again. Thus it's up to the sysadmin to set correct system encoding, no?
Oh, I think I'm starting to understand the problem. Correct me if I'm wrong, but since the content length is in bytes, the createRequest
method actually has no way of unambiguously parsing the String
. Thus, it assumes the platform encoding. (Although, whether that is a good default, is debatable, I think.)
This would mean, that the createRequest
method cannot be implemented correctly (by "correctly" I mean to return a correct parsed message on any JVM). If that is so, then the current behavior is unfortunate, but understandable. (Another question then is, why is that method in the API?)
I guess we have to find a way to parse directly from bytes to get a deterministic behavior ... all right, thanks for your help!
This line calls
String#getBytes
without specifying an encoding. This can cause problems depending on the default encoding of the platform.For example, if the default encoding is US-ASCII, the
getBytes
call will replace some (not representable) unicode characters with question marks.See also RestComm/jain-sip/issues/111.