google / btree

BTree provides a simple, ordered, in-memory data structure for Go programs.
Apache License 2.0
3.9k stars 414 forks source link

Bounds issue in insertAt #33

Closed dynajoe closed 4 years ago

dynajoe commented 4 years ago

https://github.com/google/btree/blob/be84af90a1f71c9eeac820a4cdacb863122396d6/btree.go#L204

I'm curious if this bounds issue is important? I'm using this implementation as a learning opportunity for Go and ran into this while studying the code.

The check whether the index is less than the length seems like it would prevent an unnecessary copy. But instead, this would result in a bounds error when attempting to set the item.

gconnell commented 4 years ago

Hey, joeandraverde,

What we're doing here is making space to insert an item into our array. Given an array of:

[a, b, c, d, e]

We want to add an item at position i=2. So, we:

1) grow the array so it can fit one more element, by appending a nil element to the end https://github.com/google/btree/blob/be84af90a1f71c9eeac820a4cdacb863122396d6/btree.go#L203

[a, b, c, d e, nil]

2) move the elements [2:4] over one, to make space, by copying copy(s[2+1:], s[2:]) which copies from s[2:] into s[2+1:], with https://github.com/google/btree/blob/be84af90a1f71c9eeac820a4cdacb863122396d6/btree.go#L205

[a, b, c, c, d, e]

3) overwrite the element at [2] with our new value 'x', with https://github.com/google/btree/blob/be84af90a1f71c9eeac820a4cdacb863122396d6/btree.go#L207

[a, b, x, c, d, e]

The one caveat to all of this is that, in cases where we're "inserting" at the end of the list, step 2 isn't necessary and in fact causes panics. If, for example, len(s) == 3 and we want to insert at 3 (also known as just appending to the end, steps 1 and 3 above work fine, but step 2 fails because it tries to copy(s[4:], s[3:]), and s[4:] is out of range for a len(s)==3 array.

Your comment "The check whether the index is less than the length seems like it would prevent an unnecessary copy" is correct, but it's not the API we need to provide. This implementation needs to be able to insert anywhere in the list, including at the end (where len(s) == i).

Hope that helps!

dynajoe commented 4 years ago

Let me clarify by example:

package main

type node struct {}

type children []*node

func (s *children) insertAt(index int, n *node) {
    *s = append(*s, nil)
    if index < len(*s) {
        copy((*s)[index+1:], (*s)[index:])
    }
    (*s)[index] = n
}

func main() {
    example := make(children, 10)

    // this would insert at the end, expanding the array. OK
    // length == 10, max index == 9, new index == 10 (expanding the array)
    //example.insertAt(10, &node{})

    // bounds ERROR 
    // length == 10, max index == 9, new index == 11 (overshooting the array that gets expanded by 1)
    example.insertAt(11, &node{})
}
gconnell commented 4 years ago

The if statement you've highlighted actually assumes that the passed-in index is correctly <= len(example), and it differentiates between:

func main() {
  example := make(children, 10)
  example.insertAt(10, &node{})  // if is false, no need to copy stuff
  example.insertAt(5, &node{})  // if is true, we're writing in the middle of the list so need to move the stuff after the index
}