signpost corrupts HTTP data on non-ASCII platforms (patch included)

In which environment did the problem appear?
java version "1.6.0"
Java(TM) SE Runtime Environment (build pmz3160sr8-20100409_01(SR8))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 z/OS s390-31 
jvmmz3160sr8-20100401_55940 (JIT enabled, AOT 
enabled)
J9VM - 20100401_055940
JIT  - r9_20100401_15339
GC   - 20100308_AA)
JCL  - 20100408_01

What steps will reproduce the problem?
1. Use the signpost library from a non-ASCII platform such as z/OS, which is an 
EBCDIC platform.
2. Invoke a method that issues a request, e.g. 
OAuthProvider.retrieveRequestToken(...)
3. An client-error response (e.g. 401) is received because the request is 
partially corrupt.

Please post code (fully executable, no pseudo code) that reproduces the issue.

//Groovy code - NB: the issue only re-creates on non-ASCII platforms: 
//-------------------------------------------------------------------
import oauth.signpost.*;
import oauth.signpost.basic.*;
import oauth.signpost.http.*;
import oauth.signpost.signature.*;
import oauth.signpost.exception.*;

// Consumer key & secret from dev.twitter.com - replace with your own
OAuthConsumer consumer = new DefaultOAuthConsumer("***", "***");

OAuthProvider provider = new DefaultOAuthProvider(
        "http://twitter.com/oauth/request_token",
        "http://twitter.com/oauth/access_token",
        "http://twitter.com/oauth/authorize");

println "Fetching consumer request tokens..."
// Fails with HTTP response code 401:
def authUrl = provider.retrieveRequestToken(consumer, OAuth.OUT_OF_BAND);
//-------------------------------------------------------------------

Please provide any additional information below.

There are a number of places in the codebase where conversions between byte 
arrays and Strings occur - for 
example, invocations of new String(byte[]), String.getBytes() and interactions 
with InputStreamReaders. In 
many cases, no character set is explicitly specified for these conversions, so 
the platform's default 
encoding is used. This means that for example, on z/OS, an EBCDIC character set 
will be used, which is not 
ASCII compatible. The data being handled in signpost is typically HTTP header 
data, which must be ISO8859-1 
(ASCII). Therefore, on z/OS, HTTP request data can be corrupt and HTTP response 
data can be misinterpreted.

A partial workaround is to run with -Dfile.encoding=ISO8859-1, to force the JVM 
to default to an ASCII 
encoding. However, this is not suitable when other code in the same runtime 
expects the default encoding to 
be EBCDIC.

I have put a patch up on GitHub. As well as the encoding fixes, it includes a 
minor fix to testcase 
OAuthTest.shouldCorrectlyFormEncodeParameters() to prevent it from failing on 
some JVMs due to its reliance 
on the ordering of HashMap:

http://github.com/rewbs/signpost/commit/f0aa236d214734ab0acebfe4cc40926a032ae88e
Original issue reported on code.google.com by rewbs.s...@gmail.com on 5 Jun 2010 at 11:06
papiayeee / oauth-signpost

signpost corrupts HTTP data on non-ASCII platforms (patch included) #51