ioxiocom / firedantic

Database models for Firestore using Pydantic base models.
BSD 3-Clause "New" or "Revised" License
43 stars 14 forks source link

Support firestore transactions/batches #36

Open gresnick opened 2 years ago

gresnick commented 2 years ago

https://firebase.google.com/docs/firestore/manage-data/transactions

Without this, cloud functions that share a trigger invariably enter a race condition.

gresnick commented 2 years ago

I would be happy to contribute this with some initial guidance

gmega commented 1 year ago

Indeed. This is a great project as you can get support for application-side schemas in Firestore while getting everything Pydantic has to offer (e.g. unlike other Firebase "ORMs" which have built their own stuff for schema definition), but without support for transactions we simply cannot adopt it.

antont commented 10 months ago

Any new thoughts here? I'm also considering Firedantic for our project, where have until now (just a few weeks) written a self baked simple db util for using pydantic for firebase quite nicely. It lacks a lot though.

Am just worried that might hit a wall somewhere with Firedantic.

I'd guess it's always possible to just use the firebase python sdk client etc. directly, bypassing Firedantic, e.g. for a batch op?

antont commented 10 months ago

I'd guess it's always possible to just use the firebase python sdk client etc. directly,

Just to answer my own question: yes, it seems trivial to fall back using the Client directly, am using it for more complex queries now and I guess running batch updates etc. would work somehow too.

lietu commented 10 months ago

It might not be too much work to accept an optional transaction argument to parameters so that you could use @firestore.transactional around firedantic yourself and pass in the transaction? If this seems valuable to you a PR could be interesting to see.

antont commented 10 months ago

Without this, cloud functions that share a trigger invariably enter a race condition.

What do you actually mean with this BTW? I guess two functions that get triggered by the same thing, like that they listen for document created in the same collection or whatever.. I haven't happened to do such functions yet, just have a single kind of handler per event, but I guess that can be nice easily.

lietu commented 10 months ago

Say you have cloud functions handling people submitting a form to add you to a newsletter list.

The cloud function both 1) adds you to a collection of newsletter subscribers and 2) updates statistics on subscribers per region, by extracting the list of subscribers, and counting their totals per region based on e.g. the email address domain, then saving the numbers to a collection containing the statistics

Now if your database can't just perform an atomic operation to do these two actions at once, there's a decent chance that some day there will be a rare occurrence (rarity heavily depends on the popularity of your service), that two people add themselves to the newsletter list at very nearly exactly at the same time.

Now your two cloud functions will spin up, not knowing about each other, and not synchronizing their work, both will

1) Add the user to the collection of newsletter subscribers 2) Extract the data 3) Calculate updated statistics 4) Store statistics

Now if we name these two users A and B, their requests might be processed in linear infinitely divisible time in this order:

.. so both entries were added to the list first, then they both calculated the statistics and updated the data - no problem.

But if the order instead is:

The end result will be .. wrong. A2 calculated the result before B1 added user B to the list. The request for B knew that - saved in B4, but A4 updated the wrong data to the DB afterwards. This is a race condition, which happens due to the inherent inpredictability of simultaneous actions and can be made a bit more interesting by the inpredictability of the speed at which they end up being executed.

How you'd work around this is either 1) transactions, or 2) locks

Locks:

Final result is predictable and good.

Transactions are a bit more like:

This might not be exactly faithful for how it works out in practice, but this is roughly what race conditions are in general, and how these 2 different methods of solving the problem of race conditions work.

lietu commented 10 months ago

Also to add, locks are generally speaking a simpler thing to implement and comprehend, but come with their own scalability issues, which is partially why transactions are often preferred.

antont commented 10 months ago

Say you have cloud functions handling people submitting a form to add you to a newsletter list.

Right-o, thanks for the rautalanka. I think we currently avoid this by having such statistics like things triggered by scheduled cloud functions, so that only one task runs at a time for the whole service. Functions triggered by user activity only touch their own data. Will check our ops with this in mind anyway, and keep an eye on it for later.

I may also have some time to add support for this, also before we need to, just to be prepared once the need hits. Am curious if @gresnick or @gmega have ideas about how it would look, or if you write something I can at least test etc.