LTI authentication fails when the LTI consumer provides data which contains UTF8 multi-byte characters

taniwallach commented 5 years ago

There are several reports in the forums about LTI authentication failing when the LTI source (the external LMS) is sending multi-byte UTF8 characters as part of the request parameters.

This is an issue with releases 2.13 and 2.14 and possibly with earlier releases which already had LTI support. It is not critically connected to the internationalization/localization project, as even regular "English only" WW sites may have student names which contain accented (or other special) characters which are encoded by multiple bytes in UTF8 which is used as the default encoding for the transfer of the form data, including the LTI authentication data.

The core issue

The issue is related to the fact the the Perl Net::OAuth module used to verify the LTI authentication process requires the input to be in "internal Perl character encoding" and not encoded as bytes in UTF8. See: https://metacpan.org/pod/release/KGRENNAN/Net-OAuth-0.28/lib/Net/OAuth.pm and in particular the section called I18N near the bottom which states:

Per the OAuth spec, when making the signature Net::OAuth first encodes parameters to UTF-8. This means that any parameters you pass to Net::OAuth, if they might be outside of ASCII character set, should be run through Encode::decode() (or an equivalent PerlIO layer) first to decode them to Perl's internal character structure.

Beta version of a fix is ready to be tested by people with a working WW-LMS LTI setup.

After looking into the issue, I have made a beta-version of a fix available.

It applies utf8::decode() to values in the %request_hash which match the same regexp as Net::OAuth uses to trigger the error message.
The revised code is currently only via my personal repository in a feature branch. https://github.com/taniwallach/webwork2/tree/LTI-patch-multi-byte-characters
- Modified files (raw mode):
Without access to a LTI system to test with, I cannot test this code.
Once some people review and test the code, a hot-fix pull request can be made.
The code is doubly protected to avoid damage to what currently works:
1. The code will only run if a special variable is set to 1: $allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters and an explanation and a comment out line to do this were added to conf/authen_LTI.conf.dist in the feature branch.
2. The code will only tamper with values which match the same regexp as Net::OAuth uses to trigger the error message, so is not expected to break any request data which currently can be verified.

Testing

Testing requires a course which allows LTI authentication, and connecting from an LMS properly configured to access the WeBWorK server via LTI.

Setup: It should suffice to update the relevant one of:
- lib/WeBWorK/Authen/LTIAdvanced.pm
- lib/WeBWorK/Authen/LTIBasic.pm
and to set $allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters=1 in conf/authen_LTI.conf (or in a course specific config file).
Testing should be done with both LTI data which contains and which does not contain multi-byte characters. (The simplest test cases are probably student names with accented characters.)
- Desired outcome: successful authentication in all valid cases.
- Negative feedback on failed authentication with details would help debug this.

taniwallach commented 5 years ago

Below is a sample Perl script which creates a tests the basic idea of the code change. It

Creates a sample OAuth signed "message" which includes some UTF8 Hebrew data.
Checks that the message can be verified. (Passes)
Break the message data by applying utf8::encode() to the value which has UTF8 data in it.
Attempts to verify the tampered message data. (Fails and triggers the multi-byte character warning message from Net::OAuth.)
Sets up some things so that the code added to webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm can be run to "fix" the message.
Includes and runs the code to "fix" the message.
Attempts to verify the "fixed" message data. (Passes.)

This seems to be a good indication that this is both the setting triggering the bug, and that the new code should fix it.

#!/usr/bin/perl -w

use strict;
use warnings;
use utf8;

use Net::OAuth;

my $HebrewString = "חסר קורס מהרשימה";

# ======================================================================

my $request_url = 'http://testing.123/foo';
my $consumer_key = 'my_sample_consumer_key';
my $consumer_secret = '123456789';

my %defaults = (
    nonce => int( rand( 2**32 ) ),
    timestamp => time,
    consumer_key => $consumer_key,
    consumer_secret => $consumer_secret,
    # callback => $self->callback,
    signature_method => 'HMAC-SHA1',
    request_method => 'GET',
  );

$defaults{protocol_version} = Net::OAuth::PROTOCOL_VERSION_1_0A ;

# ======================================================================

my $request = Net::OAuth->request('consumer')->new(
    %defaults,
     request_url => $request_url,
     extra_params => {
        samplePayload => "This has Hebrew in UTF8 in it $HebrewString",
        foo => 'bar'
     },
);

$request->sign;

my $post_body = $request->to_post_body ; # a application/x-www-form-urlencoded POST body

print "CREATED $post_body\n\n";

# ======================================================================

# Try to verify it

my %request_hash = ( %{$request->to_hash}  );

my $test_request;

$test_request = Net::OAuth->request("request token")->from_hash( \%request_hash,
               request_url => $request_url,
               request_method => "GET",
               consumer_secret => $consumer_secret,
   );

print "About to decode the OAuth request\n";
my $decode_body = $test_request->to_post_body ; # a application/x-www-form-urlencoded POST body

print "VERIFY ORIGINAL $decode_body\n";

if ( $test_request->verify ) {
    print "\tOAuth verification passed\n\n\n";
} else {
    print "\tOAuth verification FAILED\n\n\n";
}

# ======================================================================

# Force the samplePayload payload field into UTF-8 BEFORE it gets sent to OAuth for handling. 
# This intentionally breaks it.

print "\n\nWe now BREAK the samplePayload field by encoding it into UTF8 instead of Perl characters. This will break the verification and triugger the \"multi-byte character\" warning message from Net::OAuth.\n\n\n\n";

utf8::encode($request_hash{samplePayload});

$test_request = Net::OAuth->request("request token")->from_hash( \%request_hash,
               request_url => $request_url,
               request_method => "GET",
               consumer_secret => $consumer_secret,
   );

print "About to decode the OAuth request\n";
$decode_body = $test_request->to_post_body ; # a application/x-www-form-urlencoded POST body

print "VERIFY TAMPERED $decode_body\n";

if ( $test_request->verify ) {
        print "\tOAuth verification passed\n\n\n";
} else {
        print "\tOAuth verification FAILED\n\n\n";
}

# ======================================================================

# Try to fix any fields which seem to have UTF8 already

print "\n\nWe now attempt to fix the message using the code added to webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm after setting up a suitable set of configuration settings for it to be used.\n\n\n\n";

# First put in some things, so we can test with the code added to 
#      webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm
my %ce;
$ce{allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters} = 1;
my $ce = \%ce;
sub debug { print( my $msg = shift ); }

# The code below is that added to webwork2/lib/WeBWorK/Authen/LTIAdvanced.pm
# except for some leading spaces deleted from the first and last line.

if ( defined($ce->{allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters})
       and ( $ce->{allow_WW_LTI_code_to_fix_what_appear_as_multi_byte_characters} == 1 ) ) {
    my ( $key, $val );
    foreach $key ( keys( %request_hash ) ) {
      $val = $request_hash{$key} ;
      if ( ($val =~ /[\x80-\xFF]/ and !utf8::is_utf8($val)) ) {
        debug("Key ${key} : Detected what appears to be a UTF8 multi-byte characters which would have been passed to OAuth and triggered an error. Applying utf8::decode to the relevant value.\n");
        utf8::decode($request_hash{$key});
      }
    }
}

$test_request = Net::OAuth->request("request token")->from_hash( \%request_hash,
               request_url => $request_url,
               request_method => "GET",
               consumer_secret => $consumer_secret,
   );

print "About to decode the OAuth request\n";
$decode_body = $test_request->to_post_body ; # a application/x-www-form-urlencoded POST body

print "VERIFY FIXED TAMPERED $decode_body\n";

if ( $test_request->verify ) {
        print "\tOAuth verification passed\n\n\n";
} else {
        print "\tOAuth verification FAILED\n\n\n";
}

mgage commented 5 years ago

Hi Tani,

Could you rebase your branch LTI-patch-multi-byte-characters as a branch off develop and then submit it as a pull request to develop. I don't think we'll be able to handle this as a hot-fix. Only a few sites are using multibyte characters and they can follow your analysis for themselves. If we have a pull request to develop I think we may be able to get a few more testers.

taniwallach commented 5 years ago

Done: Instead of rebasing using git, I merged the modified code into a clean copy of develop. https://github.com/openwebwork/webwork2/pull/931

nmoller commented 5 years ago

@taniwallach : before trying to setup an environment to test everything... I just want to be sure that I'M getting it right. What you are tring to do by the line L467 is trying to match printable characters?

As by https://github.com/php-edifact/edifact/issues/20 , .... are you sure that we are able to correct the evil characters like ö, ç ..?

nmoller commented 5 years ago

@taniwallach @mgage : if I got right what I said in the last comment..

Following https://perldoc.perl.org/perluniintro.html#Unicode-and-EBCDIC (see Miscellaneous section) and list chars ... if I'm getting it right... one should add the interval \N{U+00C0} - \N{U+02AF} ... well something like that.

taniwallach commented 5 years ago

What I am trying to do is detect what may be UTF8 encoded bytes which should not be in that format (but rather in Perl's character format) when passed into OAuth for creating/checking a signature.

The conditional code is exactly that inside the encode function in Net/OAuth/Message.pm which triggers the error message reported in the forum posts listed above (those which led to my effort to fix the problem).

Net::OAuth warning: your OAuth message appears to contain some multi-byte characters that need to be decoded via Encode.pm or a PerlIO layer first. This may result in an incorrect signature. at /usr/share/perl5/vendor_perl/Net/OAuth/Message.pm line 106.

If Perl sees characters with the high bit set, but thinks the string is already valid Perl characters, the second test will fail and nothing will get changed. However, when the utf8::is_utf8($str) test fails - then it seems to be a byte array with UTF8 bytes, in which case utf8::decode() is applied.

The outside WW test code did utf8::encode($request_hash{samplePayload}); to break the OAuth payload for testing purposed, in which case the OAuth verification fails, but when we apply to proposed fix - it passes again.

taniwallach commented 5 years ago

Note: The data in the request hash which is being used to create a "message" for OAuth to verify is pulled out of the HTTP request parameters via the code:

        my %request_hash;
    my @keys = keys %{$r-> {paramcache}};
    foreach my $key (@keys) {
        $request_hash{$key} =  $r -> param($key); 
        #debug("$key -> |" . $requestHash -> {$key} . "|");
    }

so it is apparently a string of bytes and not a Perl Unicode string. As I understand it, that is the root cause of the problem. The fix is intended apply utf8::decode() when needed to transform the sequence of UTF8 bytes into a sequence of Unicode characters.

https://perldoc.perl.org/utf8.html

taniwallach commented 5 years ago

I did some more reading/review about Perl and Unicode.

In short:

Any "text" which arrives from outside Perl should be "decoded" (from whatever transfer encoding it was sent in) and thereby converted to a sequence of Perl characters (a "text" string).
Any "text" being sent out of Perl should be "encoded" from a sequence of Perl characters (a "text" string) to whatever transfer encoding is expected on the other side of the line.

Since ASCII characters (code points 0-127) are the same in ASCII and UTF-8, it seems that the LTI authentication code worked well enough, so long as all the data used only such characters, and applying utf8::decode() to them would not make any difference.

Thus, it could be that we should be applying utf8::decode() to all the parameter values before putting them into the %request_hash. Quite possibly we should first check that the encoding used by the request is actually UTF-8 (which seems to be the current standard) before assuming that to be the case. Even better would be to determine the actual request encoding, and then decode all parameters for the actual encoding used. I'm not sure if parameter ("key") names also need to be decoded, but at least for the LTI standard and common practice - these seem to stay within the ASCII range of characters.

References:

taniwallach commented 5 years ago

I am testing LTI authentication with a server with the UTF-8 support, roughly that from https://github.com/openwebwork/webwork2/pull/927

The LTI worked and created users with Hebrew names, and there was no need for the patch.

I think the reason it just works is that lib/WeBWorK/Request.pm was modified to apply Encode::decode_utf8() to all the request parameter values.

Thus, with the UTF-8 support code in place, the strings which are handled off to OAuth are all already Perl character strings and not byte-sequences in UTF-8 encoding.

Based on this analysis, the problem is likely to apply only to 2.14 and versions of devel which have not yet pulled in the UTF-8 support code.

taniwallach commented 5 years ago

Since this issue does not effect the WW 2.15 branch, it is being closed.

dlglin commented 3 years ago

Just a note: I applied this patch to my 2.14 server because I have a student with an accent in their name in the LMS (D2L). It succeeded in authorizing the student via LTI. Not surprisingly the character got mangled when adding the student to the DB since the DB is not using UTF8, but it works.

openwebwork / webwork2