strangerstudios / paid-memberships-pro

The Trusted Membership Platform That Grows with You: Restrict access to content and charge recurring subscriptions using Stripe, PayPal, and more. Fully open source. 100% GPL.
https://www.paidmembershipspro.com
Other
471 stars 360 forks source link

PayPal IPN Validate fails sometimes (consequences...) #1787

Open mircobabini opened 3 years ago

mircobabini commented 3 years ago

Describe the bug Read the history of this investigation below. Also read this comment: https://github.com/strangerstudios/paid-memberships-pro/issues/1787#issuecomment-1210671805

tl;dr

An IPN Validate failure during a recurring_payment request means we are not creating the recurring order on site and not emitting the related invoice, which is potentially illegal in some countries.

Ideas

  1. We recently discussed an IPN History panel somewhere which can be useful to resend failed IPN requests, debug, etc.;
  2. We should send an admin email notification when an IPN process is completely skipped in some edge cases (like unhandled recurring_payment requests because of: ipn validation issues, subscription order is missing, ...);
  3. We should consider send to PayPal a HTTP code != 200, to ask PayPal to retry as soon as possible instead of considering the IPN request completed.
  4. Extend the wp_remote_post timeout to 60s when doing an IPN Validation request?

Investigation

I've found a few errors in 1 month of ipn debug logs:

cURL error 56: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104 Googling for it I've found may having ALMOST the same issues; they have no "SSL_ERROR_SYSCALL" btw, but they have the connection closed by peer after 5 seconds. wp_remote_post without explicit changes to timeout param uses timeout=5; I've seen others using timeout=60 (ex. WC).

I would suggest to change some pmpro_ipnValidate things:

  1. pass $body param as array, wp_remote_post will do the rest
  2. wp_unslash( $body ) which is a deep stripslashes replacement
  3. use wp_safe_remote_post
  4. set 'timeout' => 60

Also, would be great to know why we are using:

To Reproduce Who knows... I've about 100-120 recurring_payment IPN messages every day... and experienced the validate issue twice in a month.

mircobabini commented 3 years ago

1st case error: cURL error 56: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104

    ERROR
    Error Info: Array
(
    [http_request_failed] => Array
        (
            [0] => cURL error 56: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
        )

)

    WP_Error Object
(
    [errors] => Array
        (
            [http_request_failed] => Array
                (
                    [0] => cURL error 56: OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
                )

        )

    [error_data] => Array
        (
        )

    [additional_data:protected] => Array
        (
        )

)
mircobabini commented 3 years ago

2nd case error: cURL error 28: Operation timed out after 5000 milliseconds with 0 bytes received

    ERROR
    Error Info: Array
(
    [http_request_failed] => Array
        (
            [0] => cURL error 28: Operation timed out after 5000 milliseconds with 0 bytes received
        )

)

    WP_Error Object
(
    [errors] => Array
        (
            [http_request_failed] => Array
                (
                    [0] => cURL error 28: Operation timed out after 5000 milliseconds with 0 bytes received
                )

        )

    [error_data] => Array
        (
        )

    [additional_data:protected] => Array
        (
        )

)
eighty20results commented 3 years ago

You are correct about why the cURL error is thrown. If the IPN server spends more than 5 seconds waiting, the connection is reset.

From the standard error.h definition: #define ECONNRESET 104 /* Connection reset by peer */

The challenge with setting a 60 second timeout is that on a very busy system, you run the risk of exhausting the # of sockets available to the site and triggering a much bigger and more complex set of problems. I'd suggest a stepwise climb up in the timeout values, so instead of a massive 20x increase, maybe go for something a bit more conservative? Maybe 10 seconds? If the incoming connections isn't processed in 10 seconds, I would argue there's a general performance issue on the site that should be addressed. Could highlight that to end-users if/when it happens..?

mircobabini commented 3 years ago

Thanks for your feedback @eighty20results!

Maybe 10 seconds?

Yeah, why not. We can also make it filterable just in case.

Could highlight that to end-users if/when it happens..?

I think that the IPN system needs something to notify the admin when this happens; I mean, you can have log enabled or not, but when you face this kind of issues you are missing an order; then probably missing an invoice and so on.

This can be an issue for many european businesses: missing an invoice is something that can lead the business to administrative sanctions in some cases.

A constant like IPN_NOTIFY_ERRORS (bool) might be useful as well... or a filter, I don't really like constants btw.

mircobabini commented 2 years ago

This happened to me just one more time in 6 months, during a pmpro_ipnValidate() call to PayPal:

[http_request_failed] => Array
(
    [0] => cURL error 35: OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to www.paypal.com:443
)

Closing this for now.

mircobabini commented 2 years ago

Reopening after some further investigation on my logs.

I recently had: [0] => cURL error 28: Connection timed out after 10001 milliseconds [0] => cURL error 28: Operation timed out after 10001 milliseconds with 0 out of 0 bytes received [0] => cURL error 6: Could not resolve host: www.paypal.com [0] => cURL error 35: OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to www.paypal.com:443

✅ One during a recurring_payment_profile_created (amen, it's nothing more than a log handler).

⚠️ One during a recurring_payment_profile_cancel (amen, even if membership status does not reflect the gateway status if the cancellation was issued from the gateway, our cron will notice the expiration and the customer will lose the membership sooner or later. Not too bad.)

‼️ TWO during a recurring_payment call. This is bad:

Reported some ideas on top.