Webhooks are being sent in incorrect order.

jofftiquez commented 4 years ago

The more information we have the easier it is for us to help. Feel free to remove any sections that might not apply

Issue

Sometimes webhooks are being sent incorrectly. Specific events:

customer.subscription.created
customer.subscription.updated

This causes our system to record the subscription status incorrectly. In the stripe dashboard, the actual status of the subscription is already active, however our system recorded incomplete because we received the created event later that it's updated counterpart.

I wonder if this will also happen in production?

Expected Behavior

Expected behavior would be:

customer.subscription.created first
then customer.subscription.updated

Steps to reproduce

Note network throttling might help. I have a slow internet.

Create a checkout session
Checkout using session id
Checkout success
Webhooks successful but, events are send incorrectly.

Traceback

Correct (created then updated)

Wrong (updated then created)

Environment

ArchLinux

jofftiquez commented 4 years ago

Another info that might be helpful. There are 2 of us using the cli on the same stripe test account. Both running on same app url http://localhost:7500 but running on different computers and network.

We found out that somehow the events from machine 1 is being accommodated by machine 2. Weird.

tomer-stripe commented 4 years ago

Hey @jofftiquez! This isn't a technically a bug and the behavior isn't specific to the CLI. We do not provide explicit ordering for webhook events so it's possible for an updated event to arrive before the corresponding created one (in production too). Check out https://stripe.com/docs/webhooks/best-practices#event-ordering for more information.

We found out that somehow the events from machine 1 is being accommodated by machine 2.

This is how it works in production too so the CLI is mimicking that behavior. We made the CLI mimicking production behavior as much as possible so that there aren't surprises when you go live :)

jofftiquez commented 4 years ago

Hey @tomer-stripe thanks! Got it.

dberke711 commented 4 years ago

@tomer-stripe How is this not considered a bug? I am seeing the same issue, and it is wreaking havoc with my integration. If I can't rely on the subscription.created > subscription.updated > session.completed events coming in that order, how do I know what the current status is of my subscription? Note that those events also need to be synchronous - subscription.updated shouldn't be sent until Stripe receives a 200 response from the subscription.created webhook (same thing with session.completed waiting for subscription.updated)... otherwise, this could lead to a race condition on my server.

I am relying on the subscription status field, and if the last webhook event that I receive happens to be subscription.created, it will have a status of "incomplete" which will overwrite the status of "active" sent by the subscription.updated event that arrived first.

dberke711 commented 4 years ago

So the solution I came up with is to completely ignore the subscription data that is posted when my webhook handler is called. Rather than relying on that data, I use the subscription ID to call the subscriptions.retrieve API to get the current subscription data. THAT data is up-to-date, so I don't have to worry about the order of the subscription and session events arriving.

I think the documentation on handling session completion is misleading. Under "webhooks" on this page https://stripe.com/docs/payments/checkout#payment-success it reads:

Stripe sends the checkout.session.completed event for a successful Checkout payment. The webhook payload includes the Checkout Session object, which contains information about the Customer, PaymentIntent, or Subscription,

While this may be technically correct, it implies that the whole process has completed, when in actuality the corresponding events may not have yet been sent. I think that should be clarified to avoid confusion.

spiffytech commented 2 years ago

Is there a more robust solution than refetching? Receiving multiple webhooks will trigger multiple refetches, and then I'm just hoping the resource has settled by the time I trigger the refetches. If it hasn't, then since the refetches will complete in non-deterministic order, I still have a race condition.

I can find some way to ensure my (distributed) app only does the refetch for one webhook at a time, but engineering my own solution here seems like a lot just to know if a subscription is ready for provisioning.

fourcolors commented 2 years ago

I think this was a little miss leading as well but each event object contains a created timestamp. Checking the events it seems like this created filed is accurate. @tomer-stripe can you confirm this?

I'm guessing the best solution is to store the created timestamp and only update if the new event happened after the last event. This can be a simple upsert style update which a check on this column if you're using traditional db like postgres.

calling retrieve is probably the simplest solution though.

spiffytech commented 2 years ago

I'm guessing the best solution is to store the created timestamp and only update if the new event happened after the last event.

In some of my tests, multiple webhooks for a resource have come through with the same created value. That seems not unlikely, since the field is relatively low resolution at whole seconds.

For now I've settled on treating webhooks as a signal to retrieve, using Redlock to ensure no single resource has parallel retrieve operations happening.

ccssmnn commented 2 years ago

Running into this as well. First thinking that my webhooks may be too slow, and I need transactions on my database. Then I saw the events coming in the "wrong" order. Unfortunately, the created field resolution is too low to use it as a workaround. I'm afraid of running into usage limits some day when fetching data from stripe on every event to receive the newest data.

mstaicu commented 2 years ago

Also running in this issue. Any other workarounds that the community found?

Since I'm using NATS Streaming Server with the node-nats-streaming NPM package, I was thinking of emitting events to my data streaming system and have any listeners interested in those events reject or acknowledge the event data based on the business logic. That way I would have a fast way of rejecting the events and accepting them based on my logic, instead of relying on Stripe to resend the events or have the events in order. I also deeply believe that this is the mother of all workarounds

Stripe event -> Webhook case handler -> Emit NATS event -> Listeners reject / accept message

spiffytech commented 2 years ago

Stripe has a /events endpoint you can poll, instead of relying on push-based webhooks. I'm considering adopting it, but I haven't verified what order it lists events in. I'm hoping that they're in a sorted order since they're delivered as an array without any worries about transmission order.

razorness commented 2 years ago

I don't know how a message bus could help determining the current version of an object. However, I came to the conclusion that Stripe needs to implement a timestamp precise enough to determine which version of an object is the current.

I came up with a two phase solution:

Try inserting object into data base and update row if event creation timestamp is definitely newer. F.e. insert statement with update on conflict combined with where clause against event creation timestamp.
If number of affected rows is 0, I fetch object from REST API and just update object in database.

To ensure no conflicting overwrite, I implemented a global lock on given object id. Luckily, stripe ids are globally unique.

spiffytech commented 2 years ago

The /events endpoint does appear to return items in reverse-chronological order (at least for a given resource - I haven't verified overall order, but I would presume it'd show the same behavior). This behavior is not mentioned in the docs.

Here's what I get after passing test data through jq:

abdel-ships-it commented 1 year ago

I understand this is by 'design' as per the event-ordering docs I spent a whole day building webhooks that rely on the lifecycle of the subscription, but now I realise the order is not gaurenteed, I don't know what to do.

And polling or calling the API everytime the user logs seems like the wrong thing to do, whats the recommended solution here? Are consumers of the stripe API supposed to implement some form of events bus system and ordering the events by created time?

I know the stripe CLI has nothing to do with this, but if someone in the team can re-direct us to the right place for this, that would be great!

spiffytech commented 1 year ago

whats the recommended solution here?

When I asked Stripe support they said to to just disregard each webhook's payload and refetch the resource the webhook is for. Looking around online, this seems to be the popular solution.

Are consumers of the stripe API supposed to implement some form of events bus system and ordering the events by created time?

Event timestamps are too low-resolution for recipients to order them by creation time: timestamps have whole-second resolution, and multiple related events can be generated within the same second.

Further, if timestamps were millisecond-denominated, there's a possibility we'd discover Stripe creates events within a single millisecond. There's still some risk of a data race (though you may be comfortable with it).

If you want guaranteed event ordering, you have two choices:

Query /events, since Stripe seems like they have an internal oracle for event ordering that they don't expose to the public.
Serialize webhook processing within your application, refetching resources while guaranteeing you only handle only one event at a time.

PiotrDabkowski commented 1 year ago

@tomer-stripe This is super problematic for your clients. It would be possible to send them in order from your side, by sending requests to webhooks, one at a time, and by waiting for the 200 status code response (or a failure after a number of retries) and only then sending the next event. Instead you move the complexity to your clients.

The fact that it is recommended to ignore webhook request payload by your support and query API is a complete BS. There can still be a race condition, unless client ensures that only 1 request per given customer is processed globally. Did you try to implement a reliable solution using your webhooks yourself? It is a nightmare.

Finally, could you at least provide more fine-grained event samples (instead of seconds) such that your clients can reorder the events on their own?

Really sloppy design on your side, heavily disappointed and we are now strongly considering another payment processor.

Merott commented 1 year ago

The workaround that has worked for me without problems so far is to use a create op for customer.subscription.created, and upsert for customer.subscription.updated.

I have a unique key constraint on the Stripe subscription ID in my database. If creating a new record in response to customer.subscription.created fails with a unique key constraint error, I just ignore it. I know the initial record could only have been created in response to a customer.subscription.updated event.

It's not an ideal solution, but it's the best I could come up with. I'm watching this space for a better solution!

Here's the high-level TypeScript code:

const stripeWebhookEventHandlers: Records<
  (e: Stripe.Event) => void | Promise<void>
> = {
  'customer.subscription.created': handleSubscriptionEvent(async event => {
    // Stripe doesn't guarantee the order in which webhooks are called, which
    // means we may receive the `.create` event AFTTER receiving the `.update`
    // event once the subscription is activated. We don't want to accidentally
    // overwrite the subscription with an outdated event. that's why we'll use
    // a `create` operation here, while using `upsert` elsewhere.
    try {
      await syncSubscriptionViaWebhook(event.data.object, 'create')
    } catch (error) {
      // if it's a unique key constraint error, it means we've already created
      // the subscription via a more up-to-date (but out-of-order) event
      if (isPrismaError(error, PrismaError.UniqueKeyConstraint)) {
        return
      }

      throw error
    }
  }),

  'customer.subscription.updated': handleSubscriptionEvent(async event => {
    await syncSubscriptionViaWebhook(event.data.object, 'upsert')
  }),

  'customer.subscription.deleted': handleSubscriptionEvent(async event => {
    await syncSubscriptionViaWebhook(event.data.object, 'upsert')
  }),
}

gspirov commented 1 year ago

I know this is an old issue but let me introduce what kind of "workaround" I've implemented on my end.

Note: Even if this "solution" provides a way of consistency it comes with multiplied cost of performance and resources overhead.

First of all, IMHO the events should be fully sequential. When state of an entity (customer, subscription, invoice) is involved each event of the sequence should be co-dependent on the previous one and the processor (Stripe in this case) must guarantee on their end that you receive them in the proper order. I still have doubts that they went with this asynchronous approach because of huge technical architecture issues. It's unacceptable to point in their documentation that ensuring event ordering is job of the integrator. See

Allowing multiple events on you server to be processed simultaneously even with refreshing the #StripeObject by ID from their SDK's is potentially is error prone and dangerous because you'll end up in huge percentage of cases with Unique Constraint Violation. This would happen because of the low time gap between the event triggering from Stripe's processor side.

Imagine following event sequence:

customer.subscription.updated {"created": 1696754206}
customer.subscription.created {"created": 1696754206}

You have database table for storing the subscriptions. On your server web hook implementation you optimistically try to insert new subscription if does not exist each time an event is received. Due to low time gap in huge percentage of the cases there will be serious race conditions due to uniqueness. Moreover... if your application is designed to respond with errors for such cases, Stripe automatically will retry calling your server which I find as an overkill.

What we want to achieve:

Preventing event race conditions between processing events over same entity (customer, subscription, invoice).
Nevertheless it's not the best solution, we want to have valid state of an entity during processing of the event (the proper one would be if the payload of the event is the valid one at the time when event has been sent).
Depending on fresh state of the entity (re-fetched via Stripe's SDK) we want to have awareness whether or not we should process the data from the current event of execution.
Ability to fully control retries over failed events.
Of course... integrity.

Implementation:

Let's define our storage where we're gonna keep incoming events from Stripe. I will choose PgSQL as storage engine but you can implement it in any RDBMS system.

create type stripe_event_status_enum as enum ('pending', 'processing', 'processed', 'failed');

create table stripe_event (
    id serial primary key,
    event_id varchar(255) not null constraint uq_stripe_event_id unique,
    event_name varchar(255) not null,
    payload json not null,
    status stripe_webhook_event_queue_status_enum not null default 'pending',
    created_at timestamp default not null,
    executed_at timestamp
);

In you web hook you need only to store the incoming event adjusted to our newly defined queue table:

webhook.php

// verify stripe signature...
try {
     $event = Stripe\Webhook\Webhook::constructEvent(
         file_get_contents('php://input'), 
         $_SERVER['STRIPE_SIGNATURE'], 
         'your-stripe-server-secret-here'
     );

     $db->insert([
         'event_id' => $event->id, 
         'event_name' => $event->type, 
         'payload' => $event->toJSON(), 
         'created_at' => (new DateTime)->setTimestamp($event->created)
     ]);

     // respond with 200 (OK)
} catch (Stripe\Exception\SignatureVerificationException $verificationException) {
    // respond with 400 (bad request)
}

Create worker script which will infinitely loop over event queue table in executed in separate process (the important part here is that you need to ensure that single process of this script is spawned in order to avoid race conditions).

worker.php

$stripeClient = new Stripe\StripeClient(/* your server secret key here */);

while (true) {
    $retriesCount = 0;
    $done = false;

    while (!$done) {
        $db->beginTransaction();

        try {
            $stmt = $db->prepare("
                SELECT *
                FROM stripe_event 
                WHERE executed_at IS NULL 
                AND status = 'pending' 
                ORDER BY id 
                LIMIT 1 
                FOR UPDATE
           ");

           $stmt->execute();
           $event = $stmt->fetch(\PDO::FETCH_ASSOC);

           if (!$event) {
               $db->commit();
               $done = true;
               continue;
           }

           if ($retriesCount > 2) {
               $db->prepare("UPDATE stripe_event SET status = 'failed' WHERE id = :id")
                  ->execute(['id' => $event['id']]);

               $db->commit();
               $done = true;
               continue;
           }

           $db->prepare("UPDATE stripe_event SET status = 'processing', executed_at = now() WHERE id = :id")
              ->execute(['id' => $event['id']]);

           $stripeEventObject = Stripe\Event::constructFrom(json_decode($event->getPayload(), true));

           switch ($stripeEventObject->type) {
               case Stripe\Event::INVOICE_CREATED:
               case Stripe\Event::INVOICE_UPDATED:
               case Stripe\Event::INVOICE_PAID:
               case Stripe\Event::INVOICE_PAYMENT_FAILED:
                    handleInvoiceEvent($stripeEventObject);
                    break;
               // list your events here for which you've subscribed for
               default:
                   throw new \Exception('Unsupported event type.');
            }

            $db->prepare("UPDATE stripe_event SET status = 'processed' WHERE id = :id")
               ->execute(['id' => $event['id']]);

            $db->commit();
            $done = true;
        } catch (\Throwable $exception) {
            $db->rollback();
            $retriesCount++;
        }
    }
}

function handleInvoiceEvent(Stripe\Event $stripeEvent) use ($stripeClient) {
    $invoice = $stripeEvent->data->offsetGet('object');

    // always fetch from Stripe API fresh entity
    try {
        $invoice = $stripeClient->invoices->retrieve($invoice->id);
    } catch(Stripe\Exception\ApiErrorException $stripeException) {
        // not found
        if ($stripeException->getHttpStatus() === 404) {
            $db->prepare('DELETE FROM invoice WHERE stripe_invoice_id = :stripe_invoice_id')
               ->execute(['stripe_invoice_id' => $invoice->id]);
            return;
        }

        throw $stripeException;
    }

    $invoiceFromDbStmt = $db->prepare('SELECT * FROM invoice WHERE stripe_invoice_id = :stripe_invoice_id');
    $invoiceFromDbStmt->execute(['stripe_invoice_id' => $invoice->id]);
    $invoiceFromDb = $invoiceFromDbStmt->fetch(\PDO::FETCH_ASSOC);

    if (!$invoiceFromDb) {
        $db->prepare('INSERT INTO invoice (stripe_invoice_id, status) VALUES (:stripe_invoice_id, :status)')
           ->execute(['stripe_invoice_id' => $invoice->id, 'status' => $invoice->status]);
    } else {
        $db->prepare('UPDATE invoice SET status = :status')->execute(['status' => $invoice->status]);
    }
}

NachoColl commented 11 months ago

Hi, also came into this topic while implementing a new subscription model and found no way to determine the order of events, making webhook data quite useless :( and only to be used as a trigger to fetch data from the Stripe side. Is that correct? any new solution around?

:: EDITED :: 2 cents

objective I want to SAVE in my DB all the events in the correct order. I don't want to use a pre-defined order of events ("customer.created goes before customer.updated" etc)

what we know

webhooks do not deliver events in any specific order
there is no property on the event data you can use to infer the relative order from past or future events to come to your webhook

first conclusion

no way you can use webhook and the received data alone to do it

solution

as proposed by @spiffytech, use the /events API, as it looks the only "ordered" list of events :(
we use webhook not to process data, but to trigger an /events fetch
big big big issues:
- /events fetch seems to have a fetch limit (100 events)
  - meaning, if your system has for example 1M invoices created every second (for analysis), your fetch of the last 100 (on any of the 1M webhook events) may not end listing all the created events..
- inneficient, obvious.
- fetching data more frequently than your events creation rate (and not use webhooks at all) may work, but that will be even more inneficient

damm

I will go for it for now - webhook + using /events API to order the events on my side. but damm, this is not a good solution.
@tomer-stripe any proposal/corection?

Moongazer commented 9 months ago

Since days I'm facing the same problem which gives me headache how to solve it in a proper way. Its really a nightmare, and kind-of sad, because I generally really like the Stripe system (Very reliable SDK, amazing Documentations and examples compared to many other companies/providers, great interface,....). But also I understand, that's kind of impossible to send multiple HTTP requests in a small time-frame and guarantee that they will arrive in the right order. The protocol is not made for it.

Currently I'm trying to implement a different approach by using a state machine for subscriptions where each event executes a defined transition. My hope is, that the event order will not matter since the transition will just fail if the subscription is in a wrong state. Because events are retried on Stripe side after a while, sooner or later the subscription moves into the right state. Not sure if it will work out, can't wrap my mind fully around it to plan it ahead in every detail.

Katerlad commented 8 months ago

Okay I may be overthinking this, but I wanted to see what some of you guys think about the solution below, still learning the ropes here.

SETUP

I will be using PaymentIntent Service for my examples.

In my Database I will be saving every event that happens for a payment intent as well as having a up to date paymentintentrecord with the latest event id as foreign key.

LOGIC

When a Webhook is triggered in my API, I will fetch all events logged for the specific payment intent from my database, check to see if the incoming event from stripe has already been logged (In case of duplicates).

After checking for duplicate, I evaluate whether the created time on the incoming event, is newer than the latest event logged on the payment intent record, if so log the event, and update the PaymentIntentRecord to reflect that the new incoming event is the latest.

If the incoming event and the latest event have the same creation time, I then call the stripe /events API to get a list of all payment intent events with the same creation time, filter to show only the ones for that specific PaymentIntent then compare both the incoming event and the latest event attached to my databases PaymentIntent record with the index order of the results from the api call.

This Resolves same timed creation dates, and if the new incoming event or the one already attached to my PaymentIntentRecord is the latest.

Reasoning

Because of the /Events api request limit, and just traffic from the stripe api in general I didnt want to poll the /Events api every time, and instead only call it if a event with the same creation time matched the latest event logged. This should help reduce traffic. I dont have real data yet, but what im reading, 90% of the time the creation times shouldnt be the same?

Concerns

I am still trying to adhere to the sending an OK Response instantly to Stripe, IS this really needed? Shouldnt I be able to do all of this logic and then send an OK response to stripe, just in case something on my END fails? I saw somewhere that the Stripe API would wait like 5 Minutes for a response before retrying, or a bad request. Shouldnt this be enough time to do any logic you need?
Any comments on this approach would be much appreciated.

alberduris commented 5 months ago

What a mess for something that should be trivial, I'm 🤯

alexandergunnarson commented 2 months ago

The perils of eventual consistency.

stripe / stripe-cli