Closed rkachowski closed 1 year ago
Hi @rkachowski,
Version 0.3.0 was released and it changes how the library works internally. I believe the change will help this situation but I'm uncertain of the full impact.
I would love your feedback!
Please note that there are breaking changes specifically around configuration. None of your existing application code is impacted.
After thinking further, this will still be a problem but I believe it can be handled. Thanks!
@rkachowski,
I'm unable to reproduce the error. Can you provide a bit more of an example for your Multi? When the Multi process fails, is it failing because something else first raised an exception inside the Multi?
In general, a Multi should only be called inside an Fly.Postgres.rpc_and_wait/3
call. The reason is, the Repo.transaction
call (which should be Repo.Local.transaction
because it doesn't makes sense to proxy a transaction call in a stand-alone process) needs to be executed on the primary database.
If the whole Multi is executed on the primary, then any transaction rollbacks happen there and it's all fine.
I would like to reproduce this so I can catch the error and give a more instructive error message about what to do.
I'd also like to update the README docs with an example using a Multi and explicit transaction control. Basically, we can't start a transaction on the local replica but proxy write operations to the primary, execute queries, and expect it to be inside a single transaction.
I'd love an example of the types of operations you are doing, the Multi structure, and how it is being executed (RPC proxied or not).
Hey @brainlid let me update the library and try again.
In the above situation something does indeed fail first, the application code then intends to handle a failed transaction but does not get access to this due to the above in_failed_sql_transaction
error triggered by the select CAST(pg_current_wal_insert_lsn() AS TEXT)
query causing the process to die.
In a related issue, Fly.Repo.transaction
will actually execute on the local repository instead of on the remote (as described in the documentation) - I have created a pull request to address this. https://github.com/superfly/fly_postgres_elixir/pull/32
fwiw I have patched this locally to forward the error to the application upon execution. I'm not particularly happy about the implementation but it works for me https://github.com/superfly/fly_postgres_elixir/commit/7af3777f73dbad3ae492a418bb7811fc8f295c11
Hi @rkachowski,
Thanks for the PR, I'll check it out. I'd like to understand more about the situation that's demonstrates the problem. Even a comment/pseudo example would be helpful.
fwiw I have patched this locally to forward the error to the application upon execution. I'm not particularly happy about the implementation but it works for me 7af3777
I don't recommend this solution. It changes the result of the proxied function that we (as the library) know nothing about. The executed function may correctly and intentionally perform it's action and return an {:error, "something else you should know"}
. The forked code will break that.
I really want to understand the situation that's causing the problem. With that situation and a way to reproduce it, I would use a try..rescue
around the call to get the WAL value and handle it that way.
Under the current implementation, when the executed function returns an {:error, reason}
tuple this value is passed directly into wait_for_lsn
resulting in the error outlined above.
I really want to understand the situation that's causing the problem. With that situation and a way to reproduce it, I would use a try..rescue around the call to get the WAL value and handle it that way.
The wait_for_lsn
function does not expect a failed result, which is in the set of possible outcomes for a database call. rpc_and_wait
has nothing to wait for if the rpc has failed, which is a valid state for the previous operation and currently not a handled case. There is no WAL value to wait for when there is no change to replicate.
I am not happy with the solution for the reason you mentioned as changing the signature has significant consequences, yet the current api is based around a false assumption that there is always something to wait on. There is no consideration of transaction failure.
Hi @rkachowski,
I'm using version 0.3.0
and try as I might, I can't create a situation where it fails. The wait_for_lsn
function was refactored in v0.3.0 and doesn't exist anymore.
I'm creating an explicit transaction and rolling it back but not getting errors. I tried doing it on the primary through a Fly.Postgres.rpc_and_wait
and without that. I've tried doing it through a failed Ecto.Multi
.
Can you please upgrade to 0.3.0
and provide some code samples with problematic usage? It doesn't have to be compiling code even.
Here's my usage. In a LiveView from a replica region:
def add_template_locations_transaction_fail(location_ids, account_id) do
# NOTE: This returns an `{:error, :posting_not_allowed}` result.
Repo.transaction(fn ->
location_ids
|> TemplateLocations.locations()
|> Enum.each(fn location ->
{:ok, _location} = Locations.create_location(account_id, Location.to_params(location))
end)
Logger.warn("INTENTIONALLY FAILED TRANSACTION")
Repo.rollback(:posting_not_allowed)
# `rollback/1` stops execution, so code here won't be run
end)
end
# Function trigged by the LiveView that performs the inserts that fails.
def handle_info({:add_template_locations, location_ids, "multi"}, socket) do
# NOTE: I tried this version as well where it executes it directly
# on the local replica without going to the primary.
# result = add_template_locations_transaction_fail(location_ids, socket.assigns.current_account.id)
result =
Fly.Postgres.rpc_and_wait(__MODULE__, :add_template_locations_transaction_fail, [
location_ids,
socket.assigns.current_account.id
])
IO.inspect(result, label: "FUNCTION RESULT?")
{:noreply, socket}
end
It's possible that the updated version solved the issue but it's also possible that I'm just missing something that you're doing. I'd love to help resolve the issue but I need some help.
Hey @brainlid, thanks for your attention. I will try to update and create a reduced test case when I get the chance, the actual issue I'm encountering is in quite a complicated transaction / multi and I want to get it down to size.
The error still seems to exist on 0.3.0
1) test register_owner/1 returns error if owner already exists (Explorer.AdminTest)
test/explorer/admin_test.exs:36
** (Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction) current transaction is aborted, commands ignored until end of transaction block
query: select CAST(pg_current_wal_insert_lsn() AS TEXT)
code: assert {:error, changeset} = Admin.register_owner(@valid_registration_params)
stacktrace:
(ecto_sql 3.7.1) lib/ecto/adapters/sql.ex:760: Ecto.Adapters.SQL.raise_sql_call_error/1
(fly_postgres 0.3.0) lib/lsn/lsn.ex:78: Fly.Postgres.LSN.current_wal_insert/1
(fly_postgres 0.3.0) lib/fly_postgres.ex:200: Fly.Postgres.__rpc_lsn__/4
(fly_postgres 0.3.0) lib/fly_postgres.ex:170: Fly.Postgres.rpc_and_wait/4
(explorer 0.0.1) lib/explorer/admin.ex:58: anonymous fn/1 in Explorer.Admin.register_owner/1
(ecto_sql 3.7.1) lib/ecto/adapters/sql.ex:1021: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
(db_connection 2.4.2) lib/db_connection.ex:1562: DBConnection.run_transaction/4
test/explorer/admin_test.exs:38: (test)
Here is a clunky test case that causes the same issue
describe "fly postgres" do
test "shouldn't allow duplicate names" do
create_table = "CREATE TABLE awesome_names (name TEXT)"
unique_index = "CREATE UNIQUE INDEX unique_names_constraint ON awesome_names (name)"
Ecto.Adapters.SQL.query!(Repo.Local, create_table, [])
Ecto.Adapters.SQL.query!(Repo.Local, unique_index, [])
defmodule Name do
use Ecto.Schema
import Ecto.Changeset
@primary_key false
schema "awesome_names" do
field :name, :string
end
def changeset(name, params \\ %{}) do
name
|> cast(params, [:name])
|> unique_constraint(:name, name: :unique_names_constraint)
end
def insert(name) do
%Name{}
|> Name.changeset(%{name: name})
|> Repo.insert()
end
end
Name.insert("admin")
# process will die here due to `(Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction)`
# and error value will not be made available to the application
assert {:error, changeset} = Repo.transaction(fn ->
with {:ok, inserted} <- Name.insert("admin") do
inserted
else
{:error, error} ->
Repo.rollback(error)
end
end)
end
end
@rkachowski,
Thanks for the test case! I'll dig into it. :+1:
@rkachowski,
Thank you for the error replication! It was very helpful.
I believe I have a fix ready and hope you can test it out before I make a new release.
To test it out, in your mix.exs
change it like this...
# {:fly_postgres, "~> 0.3.0"},
{:fly_postgres, git: "https://github.com/superfly/fly_postgres_elixir.git", branch: "me-handle-db-errors"},
Then run:
mix deps.unlock fly_postgres
mix deps.get
And run your application.
Thanks!
Hey @brainlid, the fix works and the tests pass. Thanks again!
Fixed in version v0.3.1
Description
When an sql transaction fails, fly_postgres_elixir will still try to call
select CAST(pg_current_wal_insert_lsn() AS TEXT)
in the context of the transaction, resulting in the error** (Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction) current transaction is aborted, commands ignored until end of transaction block
This results in the process crashing and prevents the error response result of an
Ecto.Multi
from being made available to the application code and being handled appropriately.Expected result
Ecto.Multi
works as with defaultApp.Repo.Local
on the primary{:error, stage, error_response, changes_so_far}
tupleActual result
Notes
Failed test output