omise / omise-woocommerce

Omise WooCommerce Plugin
https://docs.opn.ooo/woocommerce-plugin
MIT License
47 stars 27 forks source link

Preventing async requests from Omise Webhook and Callback URI from updating WC Order status twice. #177

Closed guzzilar closed 4 years ago

guzzilar commented 4 years ago

1. Objective

In a race-condition where Omise Webhook is fired at the same time as buyer returns to the Callback URI (return_uri). A particular WooCommerce Order object will get updated its status twice which creates a consequence of a confirmation email is being sent to the recipient twice causing a confusion in a business flow that rely on this particular email.

Screen Shot 2563-07-15 at 01 52 59 copy

This happens because at the moment where Omise Webhook and Callback URI are triggered asynchronously at the same time, they both retrieve WC Order object with the pending status and update it to processing using WC_Order::payment_complete(), which triggers an email function to be executed twice.

This pull request is providing a solution to prevent the async request from creating a race-condition, by integrating with WC_Queue feature to handle those async request by executing it by order.

Related information: Related issue(s): T22125 (internal ticket)

2. Description of change

3. Quality assurance

🔧 Environments:

✏️ Details:

⚠️ At the moment, I have no other approach to test the race-condition but to try create and place an order again and again until the Webhook event is finally arrived to the store first before user get redirected back to the Callback URI.

⚠️ Also, to test this pull request, you may use ngrok to create HTTPS public url for Omise Webhook.

⚠️ To check if the Webhook is finally arrived first, you may query database with the following command: SELECT * from wp_actionscheduler_actions; Or SELECT action_id, hook, status, extended_args from wp_actionscheduler_actions ORDER BY action_id DESC LIMIT 5;

Screen Shot 2563-07-15 at 04 58 52 copy

In a typical case, you will get the action omise_async_payment_result with context: callback come first following by context: webhook. This means the Callback URI is being executed first (buyer arrives first). And another way around, if Webhook arrived first or at the same time as buyer.

Once everything is in placed and you have set Webhook properly, you may try to place an order again until the Webhook is finally arrived to the store first before the Callback URI being executed.

Screen Shot 2563-07-15 at 04 56 38

If you check at the Admin Order Detail page, there will no longer be a double customer note (meaning that the request does not reach to the WC_Order::payment_complete() function).

Screen Shot 2563-07-15 at 05 38 40 copy

4. Impact of the change

There is a limitation of concurrent that Action Scheduler library can handle. Please check the following document as reference: https://actionscheduler.org/perf

Action Scheduler will only process actions in a request until: • 90% of available memory is used • processing another 3 actions would exceed 30 seconds of total request time, based on the average processing time for the current batch • in a single concurrent queue

Also

By default, Action Scheduler will only process actions for a maximum of 30 seconds in each request. This time limit minimises the risk of a script timeout on unknown hosting environments, some of which enforce 30 second timeouts.

By default, Action Scheduler will claim a batch of 25 actions. This small batch size is because the default time limit is only 30 seconds; however, if you know your actions are processing very quickly, e.g. taking microseconds not seconds, or that you have more than 30 second available to process each batch, increasing the batch size can slightly improve performance.

By default, Action Scheduler will run only one concurrent batches of actions. This is to prevent consuming a lot of available connections or processes on your webserver.

However, it shall not effect to its capability to handle the queue. Just for some high-volume transaction per minute website, it may take some seconds or minutes until all the queue get executed.

As stated in WP-Cron: https://developer.wordpress.org/plugins/cron/#what-is-wp-cron

5. Priority of change

Immediate.

6. Additional Notes

Just to clarify the relationship between 3 names.

However, there is one point that we should understand of WP-Cron behaviour, to understand how the WC_Queue work. As stated in WP-Cron document: https://developer.wordpress.org/plugins/cron/#what-is-wp-cron

WP-Cron works by checking, on every page load, a list of scheduled tasks to see what needs to be run. Any tasks due to run will be called during that page load.

WP-Cron does not run constantly as the system cron does; it is only triggered on page load.

With the system scheduler, if the time passes and the task did not run, it will not be re-attempted. With WP-Cron, all scheduled tasks are put into a queue and will run at the next opportunity (meaning the next page load). So while you can’t be 100% sure when your task will run, you can be 100% sure that it will run eventually.

The WP-Cron is only working when "any" of page get loaded. It's not time-based as a typical system scheduler. Meaning that, even though we set some schedule to be executed after "5" mins, but if there is no one open "any" page of the WordPress website, then that schedule won't be triggered.

guzzilar commented 4 years ago

I'll write out some more test cases this evening for those failure, pending, authorized cases.

guzzilar commented 4 years ago

Not sure if this PR is concerning too many areas. Let me know if it'd be better to split some changes out to another different PR. (i.e. the Webhook's payload format from Object to array)

danfowler commented 4 years ago

I think fine to keep as one.

guzzilar commented 4 years ago

A new approach has been provided at #179