Closed taniwallach closed 5 years ago
Below is a sample Perl script which creates a tests the basic idea of the code change. It
utf8::encode()
to the value which has UTF8 data in it.Net::OAuth
.)webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm
can be run to "fix" the message.This seems to be a good indication that this is both the setting triggering the bug, and that the new code should fix it.
#!/usr/bin/perl -w
use strict;
use warnings;
use utf8;
use Net::OAuth;
my $HebrewString = "חסר קורס מהרשימה";
# ======================================================================
my $request_url = 'http://testing.123/foo';
my $consumer_key = 'my_sample_consumer_key';
my $consumer_secret = '123456789';
my %defaults = (
nonce => int( rand( 2**32 ) ),
timestamp => time,
consumer_key => $consumer_key,
consumer_secret => $consumer_secret,
# callback => $self->callback,
signature_method => 'HMAC-SHA1',
request_method => 'GET',
);
$defaults{protocol_version} = Net::OAuth::PROTOCOL_VERSION_1_0A ;
# ======================================================================
my $request = Net::OAuth->request('consumer')->new(
%defaults,
request_url => $request_url,
extra_params => {
samplePayload => "This has Hebrew in UTF8 in it $HebrewString",
foo => 'bar'
},
);
$request->sign;
my $post_body = $request->to_post_body ; # a application/x-www-form-urlencoded POST body
print "CREATED $post_body\n\n";
# ======================================================================
# Try to verify it
my %request_hash = ( %{$request->to_hash} );
my $test_request;
$test_request = Net::OAuth->request("request token")->from_hash( \%request_hash,
request_url => $request_url,
request_method => "GET",
consumer_secret => $consumer_secret,
);
print "About to decode the OAuth request\n";
my $decode_body = $test_request->to_post_body ; # a application/x-www-form-urlencoded POST body
print "VERIFY ORIGINAL $decode_body\n";
if ( $test_request->verify ) {
print "\tOAuth verification passed\n\n\n";
} else {
print "\tOAuth verification FAILED\n\n\n";
}
# ======================================================================
# Force the samplePayload payload field into UTF-8 BEFORE it gets sent to OAuth for handling.
# This intentionally breaks it.
print "\n\nWe now BREAK the samplePayload field by encoding it into UTF8 instead of Perl characters. This will break the verification and triugger the \"multi-byte character\" warning message from Net::OAuth.\n\n\n\n";
utf8::encode($request_hash{samplePayload});
$test_request = Net::OAuth->request("request token")->from_hash( \%request_hash,
request_url => $request_url,
request_method => "GET",
consumer_secret => $consumer_secret,
);
print "About to decode the OAuth request\n";
$decode_body = $test_request->to_post_body ; # a application/x-www-form-urlencoded POST body
print "VERIFY TAMPERED $decode_body\n";
if ( $test_request->verify ) {
print "\tOAuth verification passed\n\n\n";
} else {
print "\tOAuth verification FAILED\n\n\n";
}
# ======================================================================
# Try to fix any fields which seem to have UTF8 already
print "\n\nWe now attempt to fix the message using the code added to webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm after setting up a suitable set of configuration settings for it to be used.\n\n\n\n";
# First put in some things, so we can test with the code added to
# webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm
my %ce;
$ce{allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters} = 1;
my $ce = \%ce;
sub debug { print( my $msg = shift ); }
# The code below is that added to webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm
# except for some leading spaces deleted from the first and last line.
if ( defined($ce->{allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters})
and ( $ce->{allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters} == 1 ) ) {
my ( $key, $val );
foreach $key ( keys( %request_hash ) ) {
$val = $request_hash{$key} ;
if ( ($val =~ /[\x80-\xFF]/ and !utf8::is_utf8($val)) ) {
debug("Key ${key} : Detected what appears to be a UTF8 multi-byte characters which would have been passed to OAuth and triggered an error. Applying utf8::decode to the relevant value.\n");
utf8::decode($request_hash{$key});
}
}
}
$test_request = Net::OAuth->request("request token")->from_hash( \%request_hash,
request_url => $request_url,
request_method => "GET",
consumer_secret => $consumer_secret,
);
print "About to decode the OAuth request\n";
$decode_body = $test_request->to_post_body ; # a application/x-www-form-urlencoded POST body
print "VERIFY FIXED TAMPERED $decode_body\n";
if ( $test_request->verify ) {
print "\tOAuth verification passed\n\n\n";
} else {
print "\tOAuth verification FAILED\n\n\n";
}
Hi Tani,
Could you rebase your branch LTI-patch-multi-byte-characters as a branch off develop and then submit it as a pull request to develop. I don't think we'll be able to handle this as a hot-fix. Only a few sites are using multibyte characters and they can follow your analysis for themselves. If we have a pull request to develop I think we may be able to get a few more testers.
Done: Instead of rebasing using git, I merged the modified code into a clean copy of develop. https://github.com/openwebwork/webwork2/pull/931
@taniwallach : before trying to setup an environment to test everything... I just want to be sure that I'M getting it right. What you are tring to do by the line L467 is trying to match printable characters?
As by https://github.com/php-edifact/edifact/issues/20 , .... are you sure that we are able to correct the evil characters like ö, ç ..?
@taniwallach @mgage : if I got right what I said in the last comment..
Following https://perldoc.perl.org/perluniintro.html#Unicode-and-EBCDIC (see Miscellaneous section) and list chars ... if I'm getting it right... one should add the interval \N{U+00C0} - \N{U+02AF} ... well something like that.
What I am trying to do is detect what may be UTF8 encoded bytes which should not be in that format (but rather in Perl's character format) when passed into OAuth for creating/checking a signature.
The conditional code is exactly that inside the encode
function in Net/OAuth/Message.pm
which triggers the error message reported in the forum posts listed above (those which led to my effort to fix the problem).
Net::OAuth warning: your OAuth message appears to contain some multi-byte characters that need to be decoded via Encode.pm or a PerlIO layer first. This may result in an incorrect signature. at /usr/share/perl5/vendor_perl/Net/OAuth/Message.pm line 106.
If Perl sees characters with the high bit set, but thinks the string is already valid Perl characters, the second test will fail and nothing will get changed. However, when the utf8::is_utf8($str)
test fails - then it seems to be a byte array with UTF8 bytes, in which case utf8::decode()
is applied.
The outside WW test code did utf8::encode($request_hash{samplePayload});
to break the OAuth payload for testing purposed, in which case the OAuth verification fails, but when we apply to proposed fix - it passes again.
Note: The data in the request hash which is being used to create a "message" for OAuth to verify is pulled out of the HTTP request parameters via the code:
my %request_hash;
my @keys = keys %{$r-> {paramcache}};
foreach my $key (@keys) {
$request_hash{$key} = $r -> param($key);
#debug("$key -> |" . $requestHash -> {$key} . "|");
}
so it is apparently a string of bytes and not a Perl Unicode string. As I understand it, that is the root cause of the problem. The fix is intended apply utf8::decode()
when needed to transform the sequence of UTF8 bytes into a sequence of Unicode characters.
I did some more reading/review about Perl and Unicode.
In short:
Since ASCII characters (code points 0-127) are the same in ASCII and UTF-8, it seems that the LTI authentication code worked well enough, so long as all the data used only such characters, and applying utf8::decode()
to them would not make any difference.
Thus, it could be that we should be applying utf8::decode()
to all the parameter values before putting them into the %request_hash. Quite possibly we should first check that the encoding used by the request is actually UTF-8 (which seems to be the current standard) before assuming that to be the case. Even better would be to determine the actual request encoding, and then decode all parameters for the actual encoding used. I'm not sure if parameter ("key") names also need to be decoded, but at least for the LTI standard and common practice - these seem to stay within the ASCII range of characters.
References:
I am testing LTI authentication with a server with the UTF-8 support, roughly that from https://github.com/openwebwork/webwork2/pull/927
The LTI worked and created users with Hebrew names, and there was no need for the patch.
I think the reason it just works is that lib/WeBWorK/Request.pm
was modified to apply Encode::decode_utf8()
to all the request parameter values.
Thus, with the UTF-8 support code in place, the strings which are handled off to OAuth are all already Perl character strings and not byte-sequences in UTF-8 encoding.
Based on this analysis, the problem is likely to apply only to 2.14 and versions of devel which have not yet pulled in the UTF-8 support code.
Since this issue does not effect the WW 2.15 branch, it is being closed.
Just a note: I applied this patch to my 2.14 server because I have a student with an accent in their name in the LMS (D2L). It succeeded in authorizing the student via LTI. Not surprisingly the character got mangled when adding the student to the DB since the DB is not using UTF8, but it works.
There are several reports in the forums about LTI authentication failing when the LTI source (the external LMS) is sending multi-byte UTF8 characters as part of the request parameters.
This is an issue with releases 2.13 and 2.14 and possibly with earlier releases which already had LTI support. It is not critically connected to the internationalization/localization project, as even regular "English only" WW sites may have student names which contain accented (or other special) characters which are encoded by multiple bytes in UTF8 which is used as the default encoding for the transfer of the form data, including the LTI authentication data.
The core issue
The issue is related to the fact the the Perl Net::OAuth module used to verify the LTI authentication process requires the input to be in "internal Perl character encoding" and not encoded as bytes in UTF8. See: https://metacpan.org/pod/release/KGRENNAN/Net-OAuth-0.28/lib/Net/OAuth.pm and in particular the section called
I18N
near the bottom which states:Beta version of a fix is ready to be tested by people with a working WW-LMS LTI setup.
After looking into the issue, I have made a beta-version of a fix available.
utf8::decode()
to values in the%request_hash
which match the same regexp as Net::OAuth uses to trigger the error message.$allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters
and an explanation and a comment out line to do this were added toconf/authen_LTI.conf.dist
in the feature branch.Testing
Testing requires a course which allows LTI authentication, and connecting from an LMS properly configured to access the WeBWorK server via LTI.
lib/WeBWorK/Authen/LTIAdvanced.pm
lib/WeBWorK/Authen/LTIBasic.pm
$allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters=1
inconf/authen_LTI.conf
(or in a course specific config file).