nothingworksinc / ticketbeast

Back to the lesson videos:
https://course.testdrivenlaravel.com/lessons
553 stars 11 forks source link

Race condition between finding and reserving tickets? #56

Closed jusahah closed 7 years ago

jusahah commented 7 years ago

Is there a subtle race condition here (Concert.php):

public function reserveTickets($quantity, $email)
    {
        $tickets = $this->findTickets($quantity)->each(function ($ticket) {
            $ticket->reserve();
        });
        return new Reservation($tickets, $email);
    }

public function findTickets($quantity)
    {
        $tickets = $this->tickets()->available()->take($quantity)->get();
        if ($tickets->count() < $quantity) {
            throw new NotEnoughTicketsException;
        }
        return $tickets;
    }

What if between finding and reserving an another HTTP request for reservation comes in and hits the DB... doesn't it see those same tickets available, thus returning them for reservation?

Like:

Am I missing something?

adamwathan commented 7 years ago

Hey @jusahah you're not missing anything, you're totally right 😊

There's a couple reasons I decided not to dedicate any lessons to that particular race condition:

  1. It's impossible to test as far as I've been able to tell (I've tried! 😩). There's no way to write a test that proves this race condition is a problem, so it's impossible to drive out the implementation with TDD. Here's a quote from the original TDD book:

    Security software and concurrency, for example, are two topics where TDD is insufficient to mechanically demonstrate that the goals of the software have been met. Although it's true that security relies on essentially defect-free code, it also relies on human judgment about the methods used to secure the software. Subtle concurrency problems can't be reliably duplicated by running the code.

  2. The solution (which uses pessimistic table locking) is actually really hard to explain to someone who doesn't already understand it. When I was learning how it worked, it took me hours and hours to build a mental model that really matched what was happening and that I actually understood.

    For example, people commonly think "locking" a table means no other connections can access that table until the lock is released.

    This isn't actually true; anyone can read the table even while another connection has acquired a lock. No one else can update it until the lock is released, but they can still read records into memory that the first connection is about to modify, and then update those records once the lock is released.

    This is mega counter-intuitive to most people because it feels like the lock is doing nothing; other connections are still able to stomp the data!

    But the reality is that the connection acquiring the lock isn't preventing connections that come after it from stomping it's data, it's asking the database, "hey, I intend to update the records I'm about to select, so please don't let me select them until everyone else is done with them."

Because of how difficult it is to get this to click for people (myself included), and because of the fact that we couldn't use TDD to drive out the solution, I decided not to go down that rabbit hole, because it likely would've been 30-40 minutes worth of content and the course is already absolutely massive in size.

That said, here's what the solution looks like:

public function reserveTickets($quantity, $email)
{
    $tickets = DB::transaction(function () use ($quantity) {
        $tickets = $this->tickets()->available()->take($quantity)->lockForUpdate()->get();

        if ($tickets->count() < $quantity) {
            throw new NotEnoughTicketsException;
        }

        $tickets->each->reserve();

        return $tickets;
    });

    return new Reservation($tickets, $email);
}

Cheers!

jusahah commented 7 years ago

Thanks for the code snippet and explanation!

I understand your reasoning for not going down the rabbit hole.

However, these deep insights (like what Laravel's table locking truly means) are the most valuable part of the course. At least for me.

svenluijten commented 7 years ago

But the reality is that the connection acquiring the lock isn't preventing connections that come after it from stomping it's data, it's asking the database, "hey, I intend to update the records I'm about to select, so please don't let me select them until everyone else is done with them."

This alone just made it click for me. Woah. Thanks for the extremely detailed answers you give on issues like this, Adam! 👍

peterlupu commented 7 years ago

Okay, hear me out, I'm gonna sketch a possible solution (not how to test it mind you, but how to solve it) and you guys tell me what you think.

Firstly, I'm not gonna use MySQL to do this quantity availability check, instead I'm going for Redis because it's atomic.

Let's assume we have the following tickets available:

Ticket A 5 pcs Ticket B 3 pcs Ticket C 4 pcs Ticket D 8 pcs

Let's assume two users, Joe and Mike. Joe wants the following tickets:

Ticket A 4 pcs Ticket B 2 pcs Ticket C 3 pcs

Mike wants:

Ticket B 1 pcs Ticket C 2 pcs Ticket D 3 pcs

We store the stock in Redis using SET (there might be something more suitable, but for the sake of simplicity I'm going to use this).

SET TicketA 5
SET TicketB 3
SET TicketC 4
SET TicketD 8

They both fire at the same time, but in reality, nothing happens at the same time as something else, something's always gonna happen first.

Before we do the availability check, we should order the tickets (by id, reference, number, alphabetically) so that we avoid 'deadlocks' (consider one user ordering tickets A and B, and another user ordering tickets B and A -- it makes it easier to order the tickets alphabetically before we check, so ticket A would fail first if there's no quantity available).

We can use Redis' DECRBY function, which will return the quantity left after we the decrement operation. For Joe:

DECRBY TicketA 4 => 1 pcs left, 1 >= 0 => OK, Joe can reserve that quantity, they are available DECRBY TicketB 2 => 1 pcs left, 1 >= 0 => OK DECRBY TicketC 3 => 1 pcs left, 1 >= 0 => OK

For Mike:

DECRBY TicketB 1 => 0 pcs left, 0 >= 0 => OK, Mike can reserve that quantity, they are available

DECRBY TicketC 2 => -1 pcs left, -1 < 0 => NOT OK, Mike cannot reserve a ticket, his order isn't placed. Everything is rolled back.

We MUST always hold on to the quantities that have been successfully DECRBY'ed, so that at any point if something fails, we need to increment those quantities back into Redis. In the above example, when Mike's validation fails for TicketC, we must increment the 1 pcs back into TicketB and 2 pcs back into TicketC.

We should also ensure that we sync the available quantities from the DB to Redis when creating the tickets in the backend. Also, we should register a shutdown function so that if our script fails, for any reason, the quantities that we already subtracted from Redis are put back in place. I do not think this is quite foolproof yet, maybe you guys can come up with a better solution.

A PHP Pseduo-code would look something like this:

function reserve($requestedQty)
{
    $remaining = Redis::DECRBY($this->ticketIdentifier, $requestedQty);

    if ($remaining < 0) {
        Redis::INCRBY($this->ticketIdentifier, $requestedQty);
        return false;
    }

    return true;
}

Also before ANYTHING of the above happens, we should simply check Redis and see if we have the requested quantities. Maybe building an array of [ticketId => availableQty] pairs would be pretty fast. It makes no sense to start doing DECRBY and INCRBY if we first use Redis' GET to check the available quantity.

Considering the example above, ticket reservation requests can come to Redis in any order, one of the two users will eventually fail.

I'm sorry if this doesn't make a lot of sense, or it's a bad approach, I wrote it pretty fast since the topic is hot right now.

I've used this method in production for quite a large scale web shop after different MySQL methods did not do the trick. The whole magic here, in my opinion is that we don't do a READ and then a WRITE, we instead use DECRBY which will use the most-up-to-date quantity available in Redis.

Thanks for taking the time to read this and I hope I hear back with thoughts :)

Peter

jusahah commented 7 years ago

@peterlupu

Adam's lockForUpdate()-based solution is much simpler.

Also, your solution introduces new race condition. Take this function:

function reserve($requestedQty)
{
    $remaining = Redis::DECRBY($this->ticketIdentifier, $requestedQty);

    if ($remaining < 0) {
        Redis::INCRBY($this->ticketIdentifier, $requestedQty);
        return false;
    }

    return true;
}

Lets say we have 5 tickets available. Thus Redis count is 5. Then following happens:

User B's reservation should succeed but it does not because of the above race condition. Nobody gets tickets.

Of course this is safer than the original race condition (at least no ticket is handed out twice), but ultimately the problem still persists...

EDIT: I see you mentioned that before doing DECRBY or INCRBY one should GET count. That might solve this race condition, or not. This gets very complicated very fast...

peterlupu commented 7 years ago

@jusahah No, I mentioned that we should firstly check if there are all the tickets available before we start doing any DECRBY and INCRBY. In your example, user A would fail without any DECRBY happening, user B would succeed.

jusahah commented 7 years ago

@peterlupu

Yeah you are right, missed that part, sorry. It is getting very complicated at that point for... no real benefit over MySQL's own locking mechanism?

peterlupu commented 7 years ago

@jusahah

When I tried the MySQL locking mechanism, the performance took a bit of hit. For instance, you're having a stock of 10,000 for each ticket and users are ordering let's say 30 tickets at a time, but with small amount of quantities for each tickets. It's sufficient for two users to want to order only one common ticket, the rest of 29 tickets being unique to each user. That would introduce a lock, making one user wait for the other.

  1. You have plenty of tickets to sell in this case, so a lock would not be the best case scenario.
  2. Only one ticket out of 30 is requested by both the users, so a lock, again, would not be the best case scenario.

Using the Redis approach, the case above would have no locking happening, allowing the users to both pass (almost) at the same time.

That was the issue that I tackled, and with Redis, although a bit more work needs to be done, I got the best results.

jusahah commented 7 years ago

@peterlupu

Still trying to wrap my head around this...and I just realized that the original issue with Adam's screencast code was not that non-existing tickets are being handed out; no, the issue was that two users could accidentally reserve the same ticket even though there are enough available tickets for both of them.

For example, lets say there are 100 000 tickets available.

User A wants just 1 ticket.

User B wants also just 1 ticket.

Now it is very clear that both users should get their reservations through... and they will... but its still possible they end up sharing the very same ticket! As far as I see the Redis solution does solve the problem with non-existing tickets (thats what the count-variable is for), but does not solve this side of the problem.

peterlupu commented 7 years ago

@jusahah Hmm, good point. My solution did not refer to the Ticket itself when reserving. I treat the quantity as a 'virtual' quantity, so which ticket gets reserved does not matter at this point. Later on in your code, after the validation is successful you can generate a ticket number, and so on, basically turning the virtual quantity into a concrete reservation.

Worth noting is that my method was also trying to stay as de-coupled as possbile from the whole reservation and order placement process.

My issue was that there were a lot of concurrent requests going on and I was trying to dismiss the users who were trying to get unavailable ticket quantities before the logic of the reservation/order was being fired. This way, the databases avoided unnecessary transactions, lock and deadlocks. Furthermore, if high traffic is a problem, another solution would be to do the data validation (name, address etc.), ticket quantity validation (with Redis like above) and then just queue the whole process so that a worker would come up and make reservations, assign an order number, sent of emails etc. This is very useful when you're trying to take some load off the database.

svenluijten commented 7 years ago

@plabbett that just looks like a spam/scam website.

adamwathan commented 7 years ago

Yeah, scary to see a file called Windows Setup.zip on some shared download page, what is that supposed to be @plabbett?

adamwathan commented 7 years ago

Looks like his account has been compromised by something spamming malware links, I've let him know 👍🏻

plabbett commented 7 years ago

Changed password, revoked all keys. Sorry for the mess.