superfly / fly_postgres_elixir

Library for working with local read-replica postgres databases and performing writes through RPC calls to other nodes in the primary Fly.io region.
https://hex.pm/packages/fly_postgres
Apache License 2.0
104 stars 10 forks source link

Execute transaction + rollback on primary. #32

Closed rkachowski closed 1 year ago

rkachowski commented 1 year ago

Description

Documentation for the Fly.Repo.transaction + Fly.Repo.rollback functions reference that they will be executed on the primary instance due to transactions being used to modify data, yet __exec_local__ is actually called internally when the function is invoked.

Fly.Repo.transaction/2 documentation


Runs the given function or Ecto.Multi inside a transaction.
  This defaults to the primary (writable) repo as it is assumed this is being
  used for data modification. Override to operate on the replica.

  See `Ecto.Repo.transaction/2` for full documentation.


This PR modifies these functions to call `__exec_on_primary__` instead to align the described functionality with the actual behavior.

Also the `Repo.load/2` function is added to the module. This function exists on `Ecto.Repo` and serves to map between structs + sets of columnar data. This doesn't actually hit the database so uses `__exec_local__` to forward to whatever repo is being used.
brainlid commented 1 year ago

We do not want the transactions to be executed on the primary. They need to be executed on the local DB.

The following code is valid and should work on the read-only replica. The goal is to let me query data with transactional consistency on the replica.

def build_active_customer_report(calendar_period) do
  MyApp.Repo.transaction(fn ->
    # select customers to include in report
    customers = Customers.list_active_for_period(calendar_period)
    # generate report for customer
    Reports.spending_during_period(customers, calendar_period)
  end)
end

It is important and valid to execute multiple SELECT queries inside a transaction specifically to get a consistent and fixed view of the data across multiple queries.

Transactions should always be run on the local database.

If a transaction should be run on the primary, then it should be explicitly proxied to the primary and executed there.

Fly.rpc_primary(__MODULE__, :build_active_customer_report, [calendar_period])
rkachowski commented 1 year ago

I understand the reasoning for this but the current behaviour (and what you have described) is the opposite of what the documentation for the function states.

https://github.com/superfly/fly_postgres_elixir/blob/main/lib/repo.ex#L341

 Runs the given function or Ecto.Multi inside a transaction.

      This defaults to the primary (writable) repo as it is assumed this is being
      used for data modification. Override to operate on the replica.

      See `Ecto.Repo.transaction/2` for full documentation.

You mention transactions with an expected usecase of data consistency on a replica database, and your documentation explicitly references transactions with an expected usecase of data modification on a primary database - these are incompatible aims.

In my experience a database transaction is typically used for handling a set of changes that you wish to deal with atomically, this may differ from what fly.io customers expect and in that case it would appear that the documentation is incorrect.

brainlid commented 1 year ago

Hi @rkachowski,

Thank you for pointing out the inconsistency of the documentation and what currently happens. I will look into it!

brainlid commented 1 year ago

The problem was solved in a different way in v0.3.1. The inconsistent docs that you pointed out were also updated.

Thanks!