peterbourgon / diskv

A disk-backed key-value store.
http://godoc.org/github.com/peterbourgon/diskv
MIT License
1.4k stars 102 forks source link

While walking, InverseTransform is invoked for directories - is this a bug? #76

Open RobHoman opened 2 years ago

RobHoman commented 2 years ago

Hey @peterbourgon - loving this library and thank you for your stewardship!

Would love to have you take a look at the following lines: https://github.com/peterbourgon/diskv/blob/2566386005f64f58f34e1ff32907800a64537e6a/diskv.go#L597-L601

By my reading, it would be preferable not to call InverseTransform for directories. In my case, I'm using the AdvancedTransform and its inverse in the README, and this is tickling the "panic" line because directories do not carry the expected extension.

What about switching the logic to this? In other words, pass on the directory BEFORE calling InverseTransform.

if info.IsDir() {
    return nil // "pass"
}

key := d.InverseTransform(pathKey)

if !strings.HasPrefix(key, prefix) {
    return nil // "pass"
}
peterbourgon commented 2 years ago

My intuition is that the bug is in the example, not the code. WDYT?

RobHoman commented 2 years ago

Perhaps I'm missing something, but as it stands I think I stand by my initial intuition.

I'll attempt to elaborate in hopes that this clarifies my interpretation of functionality.

Here is an example program:

package main

import (
  "fmt"
  "log"
  "path/filepath"
  "strings"

  "github.com/peterbourgon/diskv/v3"
)

func main() {
  const (
    bbdiskvExtension = ".bbdiskv"
  )

  kvs := diskv.New(
    diskv.Options{
      BasePath: "base",
      AdvancedTransform: func(key string) *diskv.PathKey {
        path := strings.Split(key, "/")
        last := len(path) - 1
        retval := &diskv.PathKey{
          Path:     path[:last],
          FileName: path[last] + bbdiskvExtension,
        }
        fmt.Printf("DEBG Converting Key '%s'\n", key)
        fmt.Printf("DEBG Returning PathKey '%#v'\n", retval)
        return retval
      },

      In package fmt m: func(pathKey *diskv.PathKey) (key string) {
        fmt.Printf("DEBG Inversing PathKey %#v\n", pathKey)
        extension := filepath.Ext(pathKey.FileName)
        if extension != bbdiskvExtension {
          fmt.Printf("DEBG - does not have extension: %#v\n", pathKey)
          return ""
        }
        retval := strings.Join(pathKey.Path, "/") + "/" + pathKey.FileName[:len(pathKey.FileName)-len(bbdiskvExtension)]
        fmt.Printf("DEBG Returning string: %#v\n", retval)
        return retval
      },
      CacheSizeMax: 1024 * 1024,
    },
  )

  type writeArgs struct {
    Key   string
    Value string
  }
  writes := []writeArgs{
    {"a/b", "val-for-a/b"},
  }

  fmt.Println("Writing keys: ")
  for _, write := range writes {
    fmt.Printf("- write key '%s' with value '%s'\n", write.Key, write.Value)
    err := kvs.Write(write.Key, []byte(write.Value))
    if err != nil {
      log.Fatalf("failed to write: %v", err)
    }
  }

  fmt.Println("Listing keys: ")
  keyC := kvs.Keys(nil)
  for key := range keyC {
    fmt.Printf("- key '%s'\n", key)
  }
}

Here is the output:

Writing keys:
- write key 'a/b' with value 'val-for-a/b'
DEBG Converting Key 'a/b'
DEBG Returning PathKey '&diskv.PathKey{Path:[]string{"a"}, FileName:"b.bbdiskv", originalKey:""}'
Listing keys:
DEBG Inversing PathKey &diskv.PathKey{Path:[]string{}, FileName:".", originalKey:""}
DEBG - does not have extension: &diskv.PathKey{Path:[]string{}, FileName:".", originalKey:""}
DEBG Inversing PathKey &diskv.PathKey{Path:[]string{}, FileName:"a", originalKey:""}
DEBG - does not have extension: &diskv.PathKey{Path:[]string{}, FileName:"a", originalKey:""}
DEBG Inversing PathKey &diskv.PathKey{Path:[]string{"a"}, FileName:"b.bbdiskv", originalKey:""}
DEBG Returning string: "a/b"
- key 'a/b'

By my understanding, the InverseTransform is being called back to analyze the root directory "." and also the directory a/. But these are not legitimate outputs of the AdvancedTransform function in the first place, and so there is no well defined answer that InverseTransform can give. Furthermore, I don't see why the InverseTransform callback should even be consulted by the walker function if the decision in the following line is to skip the item because info.IsDir() is true.

Another way of looking at it is that it seems throughout the code that there is an intended invariant that directories on the underlying disk never represent an actual key in the store, and I think that invariant implies that the InverseTransform callback should not be asked to ever look at such non-keys.

peterbourgon commented 2 years ago

Okay, I see what you mean.

I think the underlying problem here is an error in the original API design, and that design error's interactions with the code in the repo and the example in the README. One aspect/view on that design error is that anything in diskv uses panic to express error conditions, which is 100% wrong, and 100% my mistake. Another way to describe it would be that the various funcs, including InverseTransform, don't return errors. They should. Or, at least, their return values should be able to express an "invalid" state.

I agree that your proposed change would be an improvement. I would also like to see the example in the README updated to not panic. I'm not sure if that would require a change to the function signature.