timshannon / badgerhold

BadgerHold is an embeddable NoSQL store for querying Go types built on Badger
MIT License
515 stars 52 forks source link

Transaction Conflict. Please retry error #46

Closed xerenahmed closed 2 years ago

xerenahmed commented 3 years ago

Most of the time I get this error, Is this normal? if it is normal how can I try again in code Error:

panic: Transaction Conflict. Please retry

goroutine 10 [running]:
gitlab.com/nanovy/rent/server/database.(*BadgerDB).Test.func1(0xc0000240e0, 0x2, 0xc00000e640)
        /home/eren/Desktop/Nanovy/Rent/Server/database/connect.go:69 +0xf6
created by gitlab.com/nanovy/rent/server/database.(*BadgerDB).Test
        /home/eren/Desktop/Nanovy/Rent/Server/database/connect.go:62 +0x106
exit status 2

Code:


type Item struct {
    Id        uint64 `badgerhold:"key"`
    Name string
    Cost    int
}

func (bdb *BadgerDB) Test() {
    db := bdb.Store
    db.Badger().DropAll()

    var wg sync.WaitGroup

    for i := 0; i < 100; i++ {
        wg.Add(1)
        i := i
        go func() {
            defer wg.Done()
            item := Item{
                Name: "Glass",
                Cost: i,
            }
            if err := db.Insert(badgerhold.NextSequence(), &item); err != nil {
                panic(err)
            }
        }()
    }

    wg.Wait()

    var items []Item
    if err := db.Find(&items, badgerhold.Where("Cost").Ge(98)); err != nil {
        panic(err)
    }
    spew.Dump(items)
}
aakarim commented 3 years ago

I'm having this issue too. I've put a for loop to retry the transaction in the mean time. It fixes the problem, and is also recommended by BadgerDB.

    for {
        err := db.Insert(t.getRegistryKey(), t.task)
        if err != nil {
            if err.Error() == badger.ErrConflict.Error() {
                GetLogger(ctx).Println("retrying transaction completion conflict")
                continue
            }
            return err
        }
        return nil
    }

Also, errors.Is() isn't working on badger.ErrConflict, so I'm manually comparing errors.

I don't get this error if I use Badger directly.

timshannon commented 3 years ago

@aakarim could you paste in some runnable code that will show the transaction conflict in badgerhold but not badger directly?

aakarim commented 3 years ago

My code is much simpler when I'm using Badger directly, so it's not a fair comparison, I guess.

if err := db.Update(func(txn *badger.Txn) error {
  err := txn.Set([]byte(task.getRegistryKey()), task.MarshalBytes())
  if err != nil {
      return err
  }

  return nil
}); err != nil {
    panic(err)
}

I can run this concurrently with no conflict issues.

timshannon commented 3 years ago

Do does your data have an index? Have you tried using synchronous writes.

I think this issue stems from the fact that Badger doesn't sync writes by default, so when you are inserting or updating data in a batch, you can get conflicts.

https://dgraph.io/blog/post/badger-txn/

aakarim commented 3 years ago

Ok, I will give that a go. Quite bizarre that that option isn't set to true as default. Or maybe that's just my inexperience with low level KV stores.

aakarim commented 3 years ago

Sadly that didn't work. I get as many transaction conflicts as before.

601566661cb5f 2021/01/30 14:00:07 stored
6015666611f87 2021/01/30 14:00:07 retrying transaction completion conflict
60156665dc3e8 2021/01/30 14:00:07 retrying transaction completion conflict
60156665e8713 2021/01/30 14:00:07 retrying transaction completion conflict
60156665ec037 2021/01/30 14:00:07 retrying transaction completion conflict
60156665b6b55 2021/01/30 14:00:07 retrying transaction completion conflict
60156666036eb 2021/01/30 14:00:07 finished processing task 60156666036eb
60156666036eb 2021/01/30 14:00:07 retrying transaction completion conflict
601566661f95e 2021/01/30 14:00:07 retrying transaction completion conflict
6015666605f49 2021/01/30 14:00:07 retrying transaction completion conflict
6015666611f87 2021/01/30 14:00:07 retrying transaction completion conflict

I'm also not batching writes myself, is badgerhold doing that under the hood?

Here is my stored struct, including indexes.

type Task struct {
    TaskID        string `badgerhold:"unique"`
    JobID         int    `badgerhold:"index"`
    TimeCompleted sql.NullTime
    NumTries      int
    Errors        []TaskError
}
timshannon commented 3 years ago

Ah, so you do have an index. Since I don't have a working example I can't experiment and troubleshoot this. Can you try removing your index and see if you still get the transaction conflict?

On Sat, Jan 30, 2021 at 8:04 AM A A Karim notifications@github.com wrote:

Sadly that didn't work. I get as many transaction conflicts as before.

601566661cb5f 2021/01/30 14:00:07 stored 6015666611f87 2021/01/30 14:00:07 retrying transaction completion conflict 60156665dc3e8 2021/01/30 14:00:07 retrying transaction completion conflict 60156665e8713 2021/01/30 14:00:07 retrying transaction completion conflict 60156665ec037 2021/01/30 14:00:07 retrying transaction completion conflict 60156665b6b55 2021/01/30 14:00:07 retrying transaction completion conflict 60156666036eb 2021/01/30 14:00:07 finished processing task 60156666036eb 60156666036eb 2021/01/30 14:00:07 retrying transaction completion conflict 601566661f95e 2021/01/30 14:00:07 retrying transaction completion conflict 6015666605f49 2021/01/30 14:00:07 retrying transaction completion conflict 6015666611f87 2021/01/30 14:00:07 retrying transaction completion conflict

I'm also not batching writes myself, is badgerhold doing that under the hood?

Here is my stored struct, including indexes.

type Task struct { TaskID string badgerhold:"unique" JobID int badgerhold:"index" TimeCompleted sql.NullTime NumTries int Errors []TaskError }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timshannon/badgerhold/issues/46#issuecomment-770217452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKE2NOEFKODKYCK33VCPBLS4QGYNANCNFSM4TXQYBJQ .

-- Tim Shannon www.townsourced.com

aakarim commented 3 years ago

If I remove the index the transaction conflicts are resolved, but the queries get slower (I guess that's obvious).

Heres a flame graph of my service over 8s image

and here's the code that's taking up the bulk of that graph:

func (t *StoredTask) Complete(ctx context.Context, db *badgerhold.Store) error {
    // infinitely retry until this error doesn't show up
    for {
        t.task.TimeCompleted = sql.NullTime{
            Valid: true,
            Time:  time.Now(),
        }
        err := db.Update(t.getRegistryKey(), t.task)
        if err != nil {
            if err.Error() == badger.ErrConflict.Error() {
                GetLogger(ctx).Println("retrying transaction completion conflict")
                continue
            }
            return err
        }
        return nil
    }
}
timshannon commented 3 years ago

Yep, that puts it firmly as an issue in Badgerhold and not Badger itself.

I'll see if I can replicate this, and figure out how manage this. My guess is it's fundamentally and issue with their serializable snapshot isolation.

Basically across multiple transactions an index could've been updated more than once. This may take a bit to come up with a workaround.

aakarim commented 3 years ago

Yes, in my code it gets worse when I have a lot of goroutines writing to the same index. Maybe time to use the WriteBatch API to gather it all the index updates one transaction?

https://pkg.go.dev/github.com/dgraph-io/badger#DB.NewWriteBatch

theTardigrade commented 3 years ago

I've had this problem too. I fixed it by sending writes to a goroutine running the code below, so that only one write occurs at a time.

func (storeDatum *badgerholdStoreDatum) BatchManager() {
    for {
        func() {
            batchDatum := <-storeDatum.BatchChan

            defer storeDatum.mutex.RUnlock()
            storeDatum.mutex.RLock()

            var err error

            if storeDatum.IsStoreClosed {
                err = ErrBadgerholdStoreClosed
            } else {
                switch batchDatum.Event {
                case badgerholdBatchEventInsert:
                    err = storeDatum.Store.Insert(batchDatum.Key, batchDatum.Data)
                case badgerholdBatchEventTxInsert:
                    err = storeDatum.Store.TxInsert(batchDatum.Tx, batchDatum.Key, batchDatum.Data)
                case badgerholdBatchEventUpsert:
                    err = storeDatum.Store.Upsert(batchDatum.Key, batchDatum.Data)
                case badgerholdBatchEventUpdate:
                    err = storeDatum.Store.Update(batchDatum.Key, batchDatum.Data)
                }
            }

            if batchDatum.ErrChan != nil {
                batchDatum.ErrChan <- err
            }
        }()
    }
}

I'd love the issue to be fixed properly though (using Badger's own WriteBatch API, as @aakarim says), because Badgerhold is an awesome little database! I'm really impressed with how effective and easy it is to use.

timshannon commented 3 years ago

I gladly welcome PRs. I'm crazy busy with work at the moment. I don't have confidence that the WriteBatch API will help though, at least not without separating the writes to the index into a different transaction from the main data write. If you do that, then you'll run into the issue of one transaction failing while the other commits, which would leave you with dirty data.

If this were SQL server or some other database, we'd solve this by running it in a serializable transaction.

My first thought at a solution is to simply retry if a transaction conflict is hit. The index will be up to date and correct (as long as the current index data is re-pulled), but it could be incredibly slow.

Like I said, I welcome PRs, otherwise I get to it as soon as I can.

On Wed, Feb 24, 2021 at 3:38 AM James notifications@github.com wrote:

I've had this problem too. I fixed it by sending writes to a goroutine running the code below, so that only one write occurs at a time.

func (storeDatum *badgerholdStoreDatum) BatchManager() { for { func() { defer storeDatum.mutex.RUnlock() storeDatum.mutex.RLock()

      batchDatum := <-storeDatum.BatchChan

      var err error

      if storeDatum.IsStoreClosed {
          err = ErrBadgerholdStoreClosed
      } else {
          switch batchDatum.Event {
          case badgerholdBatchEventInsert:
              err = storeDatum.Store.Insert(batchDatum.Key, batchDatum.Data)
          case badgerholdBatchEventTxInsert:
              err = storeDatum.Store.TxInsert(batchDatum.Tx, batchDatum.Key, batchDatum.Data)
          case badgerholdBatchEventUpsert:
              err = storeDatum.Store.Upsert(batchDatum.Key, batchDatum.Data)
          case badgerholdBatchEventUpdate:
              err = storeDatum.Store.Update(batchDatum.Key, batchDatum.Data)
          }
      }

      if batchDatum.ErrChan != nil {
          batchDatum.ErrChan <- err
      }
  }()

} }

I'd love the issue to be fixed properly though (using Badger's own WriteBatch API, as @aakarim https://github.com/aakarim says), because Badgerhold is an awesome little database! I'm really impressed with how effective and easy it is to use.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/timshannon/badgerhold/issues/46#issuecomment-784944518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKE2NPDXGEAN56MCM5RLV3TATCJTANCNFSM4TXQYBJQ .

-- Tim Shannon www.townsourced.com

TwiceII commented 3 years ago

Any news on this?

timshannon commented 3 years ago

This issue should be fixed with #61. Before I push to master could someone on this chain grab that branch and give it a test to make sure it resolves your issue?

Thanks

timshannon commented 2 years ago

I'm going to merge this soon. Everything looks good to me, but with so many people interested in this fix I assume I'd have at least one other party to take a look at it.

aakarim commented 2 years ago

Apologies for the lack of comms on this. We migrated away from badgerhold because we had a deadline and the performance hit was too much to bear at the time. I'll look into this on the next performance spike.