unee-t / bz-database

Scripts and schema for the bz database so we can build the BZ FE
GNU Affero General Public License v3.0
0 stars 2 forks source link

DBD::mysql::db do failed: Lambda API returned error: Invalid Request Content. Could not parse request body into json: Invalid UTF-8 start byte 0x92 #73

Open kaihendry opened 6 years ago

kaihendry commented 6 years ago

500 errors are something to do with https://dashboard.unee-t.com/show_bug.cgi?id=66985 on production

[Wed Jun 27 01:41:09.181738 2018] [:error] [pid 1210] DBD::mysql::db do failed: Lambda API returned error: Invalid Request Content. Could not parse request body into json: Invalid UTF-8 start byte 0x92\n at [Source: [B@1bbe91ec; line: 1, column: 425] [for Statement "INSERT INTO longdescs (bug_when, thetext, work_time, who, bug_id, isprivate) VALUES (?,?,?,?,?,?)"] at /opt/bugzilla/Bugzilla/Object.pm line 742.\n\tBugzilla::Object::insert_create_data("Bugzilla::Comment", HASH(0x561f348f4500)) called at /opt/bugzilla/Bugzilla/Bug.pm line 1042\n\tBugzilla::Bug::update(Bugzilla::Bug=HASH(0x561f341cd5e8)) called at /opt/bugzilla/process_bug.cgi line 374\n\tModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::opt_bugzilla_process_bug_2ecgi::handler(Apache2::RequestRec=SCALAR(0x561f346e65e0)) called at /usr/lib/x86_64-linux-gnu/perl5/5.26/ModPerl/RegistryCooker.pm line 207\n\teval {...} called at /usr/lib/x86_64-linux-gnu/perl5/5.26/ModPerl/RegistryCooker.pm line 207\n\tModPerl::RegistryCooker::run(Bugzilla::ModPerl::ResponseHandler=HASH(0x561f3482bb60)) called at /usr/lib/x86_64-linux-gnu/perl5/5.26/ModPerl/RegistryCooker.pm line 173\n\tModPerl::RegistryCooker::default_handler(Bugzilla::ModPerl::ResponseHandler=HASH(0x561f3482bb60)) called at /usr/lib/x86_64-linux-gnu/perl5/5.26/ModPerl/Registry.pm line 32\n\tModPerl::Registry::handler("Bugzilla::ModPerl::ResponseHandler", Apache2::RequestRec=SCALAR(0x561f346e65e0)) called at /opt/bugzilla/mod_perl.pl line 137\n\tBugzilla::ModPerl::ResponseHandler::handler("Bugzilla::ModPerl::ResponseHandler", Apache2::RequestRec=SCALAR(0x561f346e65e0)) called at -e line 0\n\teval {...} called at -e line 0\n

kaihendry commented 6 years ago

My initial theory this was to do with some JSON string conversion error when feeding it to the Lambda in Aurora. IIUC we build JSON strings by hand, which I am paranoid will bite us. Perhaps https://dev.mysql.com/doc/refman/8.0/en/json-creation-functions.html would help in this instance.

franck-boullier commented 6 years ago

Checked with the user The user was trying to copy the following text in a comment:


On 26/6/2018 11:12 PM, Anh Truong wrote:

Hi we didn’t purchase any other channels. Can you please check. We do not want to pay for that as we did not use the service. They should be some parental control on the machine.


We need to understand why BZ is chocking on this...

franck-boullier commented 6 years ago

What is even weirder is that the user is able to copy other emails with no issue, this specific string of characters seems to trigger some weird error...

kaihendry commented 6 years ago

I'm seeing this issue again in the last hour on production: https://ap-southeast-1.console.aws.amazon.com/cloudwatch/home?region=ap-southeast-1#logEventViewer:group=bugzilla;stream=40298bbb8b747f099cff07b8b7b01b61d24659b05f3619210e7c5cf6a1b2c5d0;filter=post_bug.cgi;start=2018-07-01T06:37:25Z

franck-boullier commented 6 years ago

OK I'm looking into that now...

franck-boullier commented 6 years ago

User mention this issue here too: https://unee-t.slack.com/archives/C9JKCF0KX/p1530520080000197

kaihendry commented 6 years ago

https://ap-southeast-1.console.aws.amazon.com/cloudwatch/home?region=ap-southeast-1#logEventViewer:group=bugzilla;filter=%255Bip%2520%252C%2520user%252C%2520username%252C%2520timestamp%252C%2520request%2520%253D%2520*HTTP*%252C%2520status_code%2520%253D%25205**%252C%2520bytes%252C%2520referrer%252C%2520agent%255D;start=P1D

is the search query I am using to track these issues in Bugzilla production btw.

[ip , user, username, timestamp, request = *HTTP*, status_code = 5**, bytes, referrer, agent]

Then click show in stream to get context.

screen shot 2018-07-03 at 11 36 55 am
kaihendry commented 5 years ago

Since several instances of this bug again today: https://media.dev.unee-t.com/2018-11-12/invalid-utf8.mp4

I suspect moving to AuroroDB native functions might solve this mapping issue. However native functions is not support in 5.7 which adds some confusion. AWS support gave no firm timeline or an idea of a migration for current 5.7 users like us. Perhaps we should spawn a Aurora 5.6 version and experiment there.

In the meantime we should collect test cases of the bodies that trigger this issue.

franck-boullier commented 5 years ago

Perhaps we should spawn a Aurora 5.6 version and experiment there.

Sounds like a reasonable idea. Maybe we can create a new environment from scratch and run the Uilicious automated tests there to see if there are some unexpected issues.

@kaihendry what do you think?

kaihendry commented 5 years ago

IIUC our current schema doesn't actually support utf8 so a translation error might occur. https://github.com/unee-t/bz-database/issues/110

Although there could be some JSON encoding translation issue too, hence I am leaving this bug open. We should review how we call lambda functions.

franck-boullier commented 5 years ago

Schema v4.32 is a fix for that.

kaihendry commented 5 years ago

Just noticed several errors of this sort of production. Re-opening. I'm sharing the sensitive log of the last 8 hrs on Slack.

kaihendry commented 5 years ago

Moving forward we need to base64 encode the payloads to avoid this issue as AWS advises in https://console.aws.amazon.com/support/cases#/6092011631/en (DEV account support)

Please refer to https://github.com/unee-t/frontend/issues/721#issuecomment-520713818

kaihendry commented 5 years ago

Payloads need to be in a JSON format, hence:

BEGIN
CALL mysql.lambda_async(CONCAT('arn:aws:lambda:ap-southeast-1:812644853088:function:alambda_simple')
  , CONCAT('{"plb64":"',  TO_BASE64( JSON_OBJECT ('notification_type', 213) ), '"}'))
;
END

Notice the plb64 key, which https://github.com/unee-t/lambda2sns/commit/985d570403ef030a8df7519a231da9065cb0613d looks out for

kaihendry commented 5 years ago

Re-opened the AWS support issue, since I am struggling to get base64 encoded payload down without carriage returns which make JSON choke. https://s.natalian.org/2019-08-13/control-char.mp4

kaihendry commented 5 years ago

https://console.aws.amazon.com/support/home#/case/?displayId=6092011631&language=en has just responded saying:

They have confirmed, that if you convert the entire JSON object/string to BASE64 then it will not produce a well formed JSON. A workaround they have recommend is to convert to BASE64 each string value in the JSON payload and then on the receiving end in the implementation of the corresponding lambda convert the values from BASE64 to the original string.

This is problematic because:

kaihendry commented 5 years ago

base64 each value is the official answer from AWS @franck-boullier I'll update the procedures in dev to test.

I understand the inconvenience and yes I have recommend this feature to our internal service team to be added in the upcoming versions. At this point only the provided workaround is plausible i.e

If the JSON payload looks like this

{"test 1": "Jožko Mrkvičká1", "test 2": "Jožko Mrkvičká 2"}

then this is how it should be converted to BASE64:

JSON_OBJECT('test 1', TO_BASE64("Jožko Mrkvičká 1"), 'test 2', TO_BASE64("Jožko Mrkvičká 2"))

This way it will minimize the length of each values to be converted and the conversion will not include the JSON formatting characters ('":{}[]).

kaihendry commented 5 years ago

I've created a function called push to replace _alambdasimple & this is able to handle base64 encoded values. I have not committed the changes since I want to test it first in the dev environment.

So I intend to make schema changes @franck-boullier to see if it works. If you could assist testing, that would be great.

The design is a simplified https://media.dev.unee-t.com/2019-08-02/lambda2sns.png. First stage is a push to a single queue. The queue triggers process items on the queue. The queue is unfortunately not FIFO since "FIFO SQS queue is not supported as an event source".

franck-boullier commented 5 years ago

@kaihendry where can I find the code for the push function?

kaihendry commented 5 years ago

The code is pushed the in branch https://github.com/unee-t/lambda2sns/tree/lambda2sqs_push

The function push is setup in the Cloudformation template here: https://github.com/unee-t/lambda2sns/blob/lambda2sqs_push/template.yaml

And implemented here: https://github.com/unee-t/lambda2sns/tree/lambda2sqs_push/push

It's different to the old logic as documented in the diagram here https://media.dev.unee-t.com/2019-08-02/lambda2sns.png in two ways: