unee-t / lambda2sqs

Relays SQL triggered payloads to MEFE via a queue
https://ap-southeast-1.console.aws.amazon.com/lambda/home?region=ap-southeast-1#/applications/lambda2sqs
GNU Affero General Public License v3.0
0 stars 4 forks source link

Error 1452: Cannot add or update a child row: a foreign key constraint fails #22

Closed kaihendry closed 5 years ago

kaihendry commented 5 years ago

Lambda2sns works today, by passing a payload originating from a SQL trigger to MEFE, wait for a response and then sets a reply, back in the unee_t_enterprise database. Since the database's triggers leverage mysql.lambda_async, it relies on lambda2sns to give it reply, after it has initiated the HTTP request/response with MEFE.

This microservice architecture has a problem when the same message is processed twice, the reply errors, noticed whilst testing a message from the DLQ. An erroring lambda means especially with a DLQ queue https://github.com/unee-t/lambda2sns/issues/21#issuecomment-516327525, means that the message gets retried again & again with the same failure.

This breaks the microservice tenant code must be able to process the same event repeatedly without unwanted effects, with an SQL error, when invoking CALL ut_creation_unit_mefe_api_reply. This is the error log: https://media.dev.unee-t.com/2019-07-30/logs.txt

franck-boullier commented 5 years ago

this error message does NOT point at the error being the system trying to enter the same information twice.

This error message:

"Error 1452: Cannot add or update a child row: a foreign key constraint fails (`unee_t_enterprise`.`external_map_user_unit_role_permissions_level_3`, CONSTRAINT `ext_map_user_unit_role_permissions_rooms_created_by` FOREIGN KEY (`created_by_id`) REFERENCES `uneet_enterpri)"

Tells you that you are trying to add an information which is impossible to add. That is to say this information MUST exist in a reference table and it does NOT exist in the reference table.

AFAICT this is NOT an issue of re-trying the same payload several times.

kaihendry commented 5 years ago

I can't reproduce this with EDIT_USER. I.e. trigger an error by replaying the message. https://s.natalian.org/2019-08-08/no-error.mp4

https://logs.dev.unee-t.com/l?since=8&uuid=&reqid=55af7efd-e591-447d-8a6d-b97a50b6b7bd

Next I will try CREATE_UNIT. I see the problem, without repeasting the payload, requestID "a751da7d-97e6-4203-988b-c52e95e4c06f" (second field) upon https://logs.dev.unee-t.com/

https://s.natalian.org/2019-08-08/create_unit.mp4

franck-boullier commented 5 years ago

It seems that the DEV/Staging environment is beyond repair (I've tried to fix it for the past 3 days and it still does not work as expected). Do we have a way to restore the DEV/Staging to a sane state somehow? Ideally, we have a working backup somewhere that we can use to restore:

franck-boullier commented 5 years ago

I can't reproduce this with EDIT_USER. I.e. trigger an error by replaying the message. https://s.natalian.org/2019-08-08/no-error.mp4

OK this is good news!

franck-boullier commented 5 years ago

https://s.natalian.org/2019-08-08/create_unit.mp4

@kaihendry In this video at 1.50 you mention that "we have an issue create a unit". Can you help me better understand how you can see that? What are the symptoms that allow you to reach that conclusion? Thanks

kaihendry commented 5 years ago

Not sure if I can be clearer. I create a Unit in the enterprise. The CREATE_UNIT message comes through and fails as shown in the logs.

franck-boullier commented 5 years ago

@kaihendry now that we have restored the DEV/Staging can you try again and see if the error is still there?

kaihendry commented 5 years ago

Struggling to produce a CREATE_UNIT message from https://enterprise.dev.unee-t.com/enterprise/Manage_Buildings_list.php

kaihendry commented 5 years ago

https://github.com/unee-t/enterprise/wiki/How-it-works-Create-a-new-property shows how to manufacture a CREATE_UNIT message payload.

I can confirm I'm creating units successfully in the DEV environment: https://logs.dev.unee-t.com/q?end=1565754162&reqid=6d92b080-128c-464d-ad94-cdd184dba92d&start=1565725362

To summarize this issue appeared to occur due to some bad state of the DEV database. The fix was to restore it from a good source.