tidwall / buntdb

BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support
MIT License
4.57k stars 289 forks source link

feature request: data recovery tool #82

Open slingamn opened 2 years ago

slingamn commented 2 years ago

As the maintainer of Ergo, which uses BuntDB for storage, I've gotten two reports of corrupted databases. It is not clear to me in either case that the root cause of the corruption is a bug in Ergo or BuntDB --- I suspect some combination of operator error and instability in the filesystem/storage layer that BuntDB can't recover from. What's interesting is that in both cases it was possible to repair the corrupted database by hand. Here are the two examples (in both cases, sensitive user data has been redacted, changing the lengths of key and value fields --- these are just illustrations of the type of corruption that occurred.)

The first example looks to me like operator error: it looks like some writes are interleaved out of order, with a mysterious blank line interpolated:

--- a/ircd.db.original
+++ b/ircd.db.repaired
@@ -81717,33 +81717,20 @@ $36
 {"":"2021-09-08T15:50:18.70437466Z"}
 *3
 $3
 set
 $24
 account.lastseen REDACTED
 $37
 {"":"2021-09-08T16:08:48.801563053Z"}
 *3
 $3
-set
-$33
-account.lastseen REDACTED
-$37
-{"":"2021-09-08T15:30:18.492876331Z"}
-*3
-$3
-set
-$24
-account.lastseen REDACTED
-$37
-{"":"2021-09-08T15:47:48.702280638Z"}
-
 set
 $26
 account.lastseen REDACTED
 $37
 {"":"2021-09-08T16:20:48.857926672Z"}
 *3
 $3
 set
 $41
 account.lastseen REDACTED

The second example looks to me like failure to truncate(2) in response to a partial write (Ergo will log and ignore this panic by default --- unfortunately the logs are no longer available to confirm this theory). In this case, we know that the application was running on a relatively exotic storage layer. We see a partial write of an incomplete JSON object, followed by a *3 that tries to set a different key:

--- a/ircd.db.original
+++ b/ircd.db.repaired
@@ -662666,33 +662666,20 @@ $79
 *3
 $3
 set
 $22
 account.lastseen REDACTED
 $37
 {"":"2022-01-10T21:36:34.285821554Z"}
 *3
 $3
 set
-$21
-account.lastseen REDACTED
-$89
-{"REDACTED":"2022-01-10T21:26:37.555047*3
-$3
-set
-$22
-account.lastseen REDACTED
-$37
-{"":"2022-01-10T21:48:16.873796534Z"}
-*3
-$3
-set
 $24
 account.lastseen REDACTED
 $81
 {"":"2022-01-10T21:48:22.289254759Z","REDACTED":"2021-12-05T23:32:21.349661915Z"}
 *3
 $3
 set
 $26
 account.lastseen REDACTED
 $37

Two data points isn't that much to go on, but it seems like there may be a class of data corruption cases that can be partially corrected by skipping over a damaged section of the file. In such cases it would seem desirable to have a tool (a separate executable, probably) that can skip those sections, report on the apparent extent of the damage, and produce a new, valid file. (In both of the cases here, there would have been no actual data loss because the damaged values were subsequently overwritten with new values.)

Thanks very much for your time.

tidwall commented 2 years ago

Sorry for the slow response. A data recovery tool does seem useful. I would consider accepting the tool as a contribution. But If there is an issue with BuntDB I would personally like to fix that problem ahead of adding a tool.

You mention:

The second example looks to me like failure to truncate(2) in response to a partial write (Ergo will log and ignore this panic by default --- unfortunately the logs are no longer available to confirm this theory).

If the panic is ignored, such that a Go recover() is called and that program continues to run, then this could most certainly lead to a data corruption. The fix would be to check the recovered result and if it's a file system error, exit the program. Or a change in BuntDB, to do os.Exit rather than panic.

In this case, we know that the application was running on a relatively exotic storage layer.

I'm hoping to reproduce this issue. Is there any way to construct a program that causes this issue using the exotic storage layer?

slingamn commented 2 years ago

It was the Gluster Native Client. In principle I think this condition can arise with any networked storage layer:

  1. write(2) partially succeeds, returns an error
  2. the filesystem is too unstable for truncate(2) to succeed
  3. buntdb raises a panic, but the partial write remains
  4. if the panic is suppressed, subsequent writes may go through on top of the partial write, causing corruption

os.Exit seems a bit unfriendly :-) Maybe the relevant panic could use a wrapped error of some kind, so it's easier to recognize after a recover() call?

tidwall commented 2 years ago

I added allowing for a recover mechanism. Would something like this work?

defer func() {
    if err, ok := recover().(error); ok {
        if strings.HasPrefix(err.Error(), "buntdb: ") {
            err := errors.Unwrap(err)
            if err.Error() != "my fake error" {
                t.Fatal("!")
            }
        }
    }
}()
slingamn commented 2 years ago

I think I was hoping for something compatible with errors.Is? Is there an obstacle to that?

slingamn commented 2 years ago

The Gluster corruption happened again (still no logs AFAIK), so I did a first pass at a recovery tool here:

https://github.com/slingamn/buntrepair