omgnetwork / omg-childchain-v2

pronounced /Ch-ch/
Apache License 2.0
5 stars 2 forks source link

Implement inserting transactions into a forming block #131

Closed pgebal closed 3 years ago

pgebal commented 4 years ago

References to state 'forming`: https://github.com/omgnetwork/childchain/issues/121#issuecomment-688687470

Inserting a transaction has to return block number and tx index.

Specs to come.

pgebal commented 4 years ago

If we add a transaction counter to :blocks table and read and update it for each new transaction or we'll be counting the transactions via COUNT, we'll going to have serialization failures. Is there a way in ecto or postgres to have auto-incrementing counter that keeps track of number of transactions for each block? @mederic-p

mederic-p commented 4 years ago

Would it work to simply add a UPDATE BLOCK SET tx_count=tx_count+1 WHERE .... to our multi?

pgebal commented 4 years ago

It would work, but I believe we'll have transaction serialization failures for some requests. I haven't checked how it works in practice, but please take a look on https://www.postgresql.org/docs/9.1/transaction-iso.html Serializable Isolation Level. I believe if we have a couple of those updates running concurrently some of them will fail. But maybe I don't understand it well enough.

mederic-p commented 4 years ago

If we put all the changes in a multi (it should be like this anyway) then it shouldn't be an issue right?

pgebal commented 4 years ago

@mederic-p DB will be consistent for sure. I thought we might run into serialization failures because of db transaction ordering issues. In the db transaction we have to:

  1. Update txcount in :blocks.
  2. Read txcount from :blocks
  3. Set it in txindex for incoming child chain transaction.

I think the read in 2 might cause serialization problems for concurrent db transactions, meaning only one will execute and the other will rollback.

But I might be totally wrong. I'll just code it and show you the code.

mederic-p commented 4 years ago

I see, what if, instead of storing a tx_count in blocks, we just have tx_index on transactions that resets for every new block, similarly to:

Does it help?

pgebal commented 4 years ago

That would be perfect. I'll try to do it.

pgebal commented 4 years ago

Some thoughts on calculating transaction index: To have a transaction index for submitted transaction we have to know the order of transactions included in currently forming block. That problem is hard to solve because of concurrent database transactions. I decided to go for solution that uses lock. In each db transaction that inserts child-chain transaction into the database we lock (using SELECT ... FOR UPDATE) on currently forming block. That introduces performance bottleneck, but I don''t see any other way to do it if we want to be consistent with the old API. Raw draft (not for review) is here: https://github.com/omgnetwork/childchain/pull/133/files

pgebal commented 4 years ago

When implementing this I ran into performance issues. Inserting 15 000 transactions without this change takes 8s. Inserting 15 000 transactions with this change implemented takes 130s. All durations are from my PC. SQL reported by Ecto logs is the following:

begin []⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK source="blocks" db=0.3ms
SELECT b0."id", b0."hash", b0."state", b0."nonce", b0."blknum", b0."txcount", b0."tx_hash", b0."formed_at_ethereum_height", b0."submitted_at_ethereum_height", b0."gas", b0."attempts_counter", b0."inserted_at", b0."updated_at", b0."node_inserted_at", b0."node_updated_at" FROM "blocks" AS b0 WHERE (b0."state" = $1) FOR UPDATE ["forming"]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK source="transactions" db=0.4ms
SELECT max(t0."tx_index") FROM "transactions" AS t0 WHERE (t0."block_id" = $1) [1]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.6ms
INSERT INTO "transactions" ("block_id","tx_bytes","tx_hash","tx_index","tx_type","node_inserted_at","node_updated_at") VALUES ($1,$2,$3,$4,$5,$6,$7) RETURNING "id" [1, <<248, 185, 248, 67, 184, 65, 34, 109, 90, 164, 172, 129, 39, 183, 70, 120, 39, 219, 215, 91, 178, 153, 46, 219, 242, 81, 43, 252, 249, 206, 244, 153, 113, 81, 245, 51, 1, 65, 99, 31, 169, 155, 240, 116, 133, 204, 159, 18, ...>>, <<234, 216, 89, 121, 16, 159, 184, 21, 48, 57, 42, 76, 202, 54, 203, 123, 17, 47, 180, 151, 57, 199, 132, 78, 11, 175, 190, 158, 36, 124, 231, 115>>, 0, 1, ~N[2020-09-29 15:32:01], ~N[2020-09-29 15:32:01]]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.2ms
INSERT INTO "outputs" ("creating_transaction_id","output_data","output_type","state","node_inserted_at","node_updated_at") VALUES ($1,$2,$3,$4,$5,$6) RETURNING "id" [1, <<237, 1, 235, 148, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 148, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>>, 1, "pending", ~N[2020-09-29 15:32:01], ~N[2020-09-29 15:32:01]]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=1.0ms
UPDATE "outputs" SET "spending_transaction_id" = $1, "node_updated_at" = $2 WHERE "id" = $3 [1, ~N[2020-09-29 15:32:01], 1]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.1ms
commit []⋅

If I do not lock block with FOR UPDATE I get more or less the same performance. Same If I push setting tx_index to database triggers (then I get the following SQL):

begin []⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK source="blocks" db=0.2ms
SELECT b0."id", b0."hash", b0."state", b0."nonce", b0."blknum", b0."txcount", b0."tx_hash", b0."formed_at_ethereum_height", b0."submitted_at_ethereum_height", b0."gas", b0."attempts_counter", b0."inserted_at", b0."updated_at", b0."node_inserted_at", b0."node_updated_at" FROM "blocks" AS b0 WHERE (b0."state" = $1) FOR UPDATE ["forming"]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.9ms
INSERT INTO "transactions" ("block_id","tx_bytes","tx_hash","tx_type","node_inserted_at","node_updated_at") VALUES ($1,$2,$3,$4,$5,$6) RETURNING "id" [1, <<248, 185, 248, 67, 184, 65, 34, 109, 90, 164, 172, 129, 39, 183, 70, 120, 39, 219, 215, 91, 178, 153, 46, 219, 242, 81, 43, 252, 249, 206, 244, 153, 113, 81, 245, 51, 1, 65, 99, 31, 169, 155, 240, 116, 133, 204, 159, 18, ...>>, <<234, 216, 89, 121, 16, 159, 184, 21, 48, 57, 42, 76, 202, 54, 203, 123, 17, 47, 180, 151, 57, 199, 132, 78, 11, 175, 190, 158, 36, 124, 231, 115>>, 1, ~N[2020-09-29 16:05:49], ~N[2020-09-29 16:05:49]]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.2ms
INSERT INTO "outputs" ("creating_transaction_id","output_data","output_type","state","node_inserted_at","node_updated_at") VALUES ($1,$2,$3,$4,$5,$6) RETURNING "id" [1, <<237, 1, 235, 148, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 148, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>>, 1, "pending", ~N[2020-09-29 16:05:49], ~N[2020-09-29 16:05:49]]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.6ms
UPDATE "outputs" SET "spending_transaction_id" = $1, "node_updated_at" = $2 WHERE "id" = $3 [1, ~N[2020-09-29 16:05:49], 1]⋅
[debug] module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.1ms
commit []⋅

Basically each of this transactions takes couple millliseconds + there is probably some overhead from using Ecto.Multi

InoMurko commented 4 years ago

Perhaps there's a different angle how this could be made. Let me first understand your angle :)

There are two options from your #133, afaik: On each insert (transaction insert) you check if there's a block :forming and if there is one, you use it's blknum. On each insert (transaction insert) you check if there's a block :forming and if there is none, you insert a plasma block and use it's blknum.

Correct?

pgebal commented 4 years ago

almost, what you said + if :forming block exceeds number of transaction I have to insert a plasma block and use it's blknum.

InoMurko commented 4 years ago

Okay, so there's another step happening after the above. Which is insert_block_if_transaction_limit_exceeded/2. Which seals a block after the previous step in case we reached the max number of transactions in a block.

Right? It seems to me that the previous step and this one (insert_block_if_transaction_limit_exceeded) could be just one step. For example if get_forming_block_for_update also checks if there's room for the incoming transaction (if not, create a new block, just like if there's no :forming block). And then the full block state change towards pending_submission could be done async by a genserver process.

pgebal commented 4 years ago

I'll check tomorrow if I can merge those two steps, that's a good idea to see how it will work. When it comes to changing block's state to :pending_submission I don't think we'll win a lot here, it's being done only once block is full, which is once for every ~65k transactions. But you are right on that it shouldn't be done on inserting transaction anyway (at least calculating block hash).

InoMurko commented 4 years ago

It's true the step into :pending_submission does not happen on every insert, but the select read does. Let's see what it does perf wise.

last_tx_index = repo.one(from(t in Transaction, where: t.block_id == ^block.id, select: max(t.tx_index))) || -1
pgebal commented 4 years ago

Thanks Ino. That select didn't change a lot. I stripped the SQL generated by Ecto to

 begin []⋅
     module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK source="blocks" db=0.1ms
     SELECT b0."id", b0."hash", b0."state", b0."nonce", b0."blknum", b0."txcount", b0."tx_hash", b0."formed_at_ethereum_height", b0."submitted_at_ethereum_height", b0."gas", b0."attempts_counter", b0."inserted_at", b0."updated_at", b0."node_inserted_at", b0."node_updated_at" FROM "blocks" AS b0 WHERE (b0."state" = $1) ["forming"]⋅
     module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.2ms
     INSERT INTO "transactions" ("block_id","tx_bytes","tx_hash","tx_index","tx_type","node_inserted_at","node_updated_at") VALUES ($1,$2,$3,$4,$5,$6,$7) RETURNING "id" [1, <<248, 185, 248, 67, 184, 65, 71, 208, 106, 12, 4, 232, 38, 117, 107, 86, 190, 59, 158, 103, 176, 245, 28, 222, 48, 196, 122, 254, 169, 41, 17, 205, 171, 188, 1, 234, 87, 41, 46, 6, 144, 165, 227, 108, 194, 116, 142, 231, ...>>, <<42, 213, 252, 87, 113, 111, 235, 54, 190, 173, 160, 80, 173, 55, 221, 243, 190, 123, 27, 36, 133, 215, 151, 218, 7, 112, 84, 36, 20, 171, 73, 19>>, 1, 1, ~N[2020-09-30 15:45:48], ~N[2020-09-30 15:45:48]]⋅
    module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.2ms
     INSERT INTO "outputs" ("creating_transaction_id","output_data","output_type","state","node_inserted_at","node_updated_at") VALUES ($1,$2,$3,$4,$5,$6) RETURNING "id" [2, <<237, 1, 235, 148, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 148, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>>, 1, "pending", ~N[2020-09-30 15:45:48], ~N[2020-09-30 15:45:48]]⋅
       module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.2ms
     UPDATE "outputs" SET "spending_transaction_id" = $1, "node_updated_at" = $2 WHERE "id" = $3 [2, ~N[2020-09-30 15:45:48], 3]⋅
    module=Ecto.Adapters.SQL function=log/4 ⋅QUERY OK db=0.1ms
     commit []

which does not lock and doesn't keep track of tx_indexes, always insterts 1 as tx_index and I'm getting about the same performance (around 10% better) as in case of using SELECT .. FOR UPDATE and SELECT MAX() .... Currently I'm at 500-600 db transactions per second.

boolafish commented 4 years ago

Just an idea. Reading from the thread/discussion here it seems the performance is to be bottlenecked by tx index incrementation (right?) What if we make tx index flagging async? So we first write tx to DB without index and then wait for the tx index to be batch flagged by another process. Now we decrease the need of lock to how often the process batch flag those tx indexes.

The API call can still be synchronised I think. So the flow would be:

  1. Write tx to DB
  2. Wait for tx to get its index
  3. Return the data

The process that batch flags the index probably need to do it every Xms where X is not too big (to make the API call able to return back the data without long waiting time), probably we can try....50 or 100ms +- some jitter (to avoid 2 childchain instance always try to flag at same timing) to see if it helps?

mederic-p commented 4 years ago

Thanks @boolafish for the solution, I'm not totally sure that locking+txindex is indeed the bottleneck. To me it feels like postgres is the bottleneck, however if @pgebal could try a POC of your solution and see how it performs it could be great!.

boolafish commented 4 years ago

lol didn't read that @pgebal mentioned: "always insterts 1 as tx_index". But I think in general, what I mean is we can probably batch update everything that this PR is aiming to do on the tx instead. Then we can hope it to be more efficient a bit by having less things locked and batch updating stuff.

pgebal commented 4 years ago

Thanks Andy. Yes, I think we will eventually have to go with doing it async. That introduces a trade-off. Requests will take longer and possibly the job that determines tx_indexes and block number will still have to lock on some db row. But we won't know how it works until we try it.

To move forward with the development of childchain we decided with Mederic to merge version that handles around 500-600 submits per second and properly handles concurrent transactions.

pgebal commented 4 years ago

Just to keep everything in context. When we use single transaction insert like currently in master we can handle 5 times more transactions. On my local setup about 2500 per second, compared to 500-600.