nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

std/lists: support destructive add/prepend for `SinglyLinkedList`, `DoublyLinkedList` + other `lists` improvements #303

Open timotheecour opened 3 years ago

timotheecour commented 3 years ago

current implementation in https://github.com/nim-lang/Nim/pull/16362 is problematic because:

I've just realized that this operation is not really meaningful for doubly linked lists, as it will always leave one in an inconsistent state.

But actually it turns out there's a similar problem with SinglyLinkedList: b is left in inconsistent state after being added twice, and other problems below:

when defined case1:
  import lists, sequtils
  var a = [0,1].toSinglyLinkedList
  a.add a
  echo a # hangs, since you have a cycle

when defined case2:
  import lists, sequtils
  var a = [0,1].toSinglyLinkedList
  var b = [2,3].toSinglyLinkedList
  a.add b
  a.add b
  echo b # would hang: b is left in inconsistent state after being added twice
  # echo a # would hang

proposal

This proposal involves adding a flag to indicate that a list is cycle-free assuming user sticks add, prepend + other APIS in lists.nim; overhead is negligible.

The proposal is to support destructive add and prepend for both SinglyLinkedList and DoublyLinkedList as follows:

Note: the owned flag is added to SinglyLinkedList, not SinglyLinkedNodeObj, so it doesn't asymptotically increase memory for normal use (few lists, lots of nodes)

Araq commented 3 years ago

While I can imagine your solution to work, I think it's too complex and I am not aware of other list implementations that use your solution.

See also: https://doc.rust-lang.org/std/collections/struct.LinkedList.html#method.append

This reuses all the nodes from other and moves them into self. After this operation, other becomes empty. This operation should compute in O(1) time and O(1) memory.

timotheecour commented 3 years ago

that's almost the same approach, in fact that was my original idea but then i figured keeping b (but flagging it as "owned" (or "consumed")) is a bit more flexible. In either case you have a destructive add/prepend, but with "owned" you can still peek at the data that was appended.

Either way, either using an "owned" flag or destroying b would be a net improvement over https://github.com/nim-lang/Nim/pull/16362

(thanks for the link btw)

salvipeter commented 3 years ago

This won't solve the problem of DoublyLinkedLists, because if you a.add(b) and then c.add(b), the first node of b cannot refer to the last node of a and c at the same time.

In my opinion, if someone uses list concatenation reasonably, these cases do not occur. If you create a cycle, you get an infinite loop, I think of that as a "fact of life" :) Adding a dirty bit, like the one proposed here, will result in many extra computations, and won't save the user if he does something like a.tail.next = a.head. But this is a good reason to add an outplace operator, as well.

Also, e.g. Common Lisp has a special variable that determines whether cycles are detected or not when printing. We could at least give a predicate to decide if a given list is circular.

salvipeter commented 3 years ago

I like the Rust solution, because it's so simple, although takes a bit of freedom away (sometimes it is good to have shared structures).

timotheecour commented 3 years ago

This won't solve the problem of DoublyLinkedLists, because if you a.add(b) and then c.add(b), the first node of b cannot refer to the last node of a and c at the same time.

after a.add b, b is marked as owned, then when you call c.add(b) c will also be marked as owned according to what i wrote above: if b.owned: a.owned = true else: b.owned = true, to indicate c is in an invalid state (ie that may contain cycles); this may or may not be what you want, but at least it's self consistent.

If destroying b instead of using an owned flag (ie the rust approach), the semantics are instead making c.add(b) a noop since b was set to empty after a.add(b); it's different, but also self consistent.

note: special case of a.add a or similar

in rust:

fn main() {
  use std::collections::LinkedList;
  let mut list1 = LinkedList::new();
  list1.push_back('a');
  list1.push_back('b');
  println!("{:?}", list1);
  list1.append(&mut list1); // error
}

you (correctly) get an error:

error[E0499]: cannot borrow `list1` as mutable more than once at a time
 --> /Users/timothee/git_clone//nim//timn//tests/nim/all/t11524b.rs:8:3
  |
8 |   list1.append(&mut list1);
  |   ^^^^^^------^----------^
  |   |     |      |
  |   |     |      first mutable borrow occurs here
  |   |     first borrow later used by call
  |   second mutable borrow occurs here

The question is, how do we do the same for nim (in a robust way, not just this special case), with same O(1) complexity? https://github.com/nim-lang/Nim/pull/14976 would work but is there something simpler?

var a = [0,1].toSinglyLinkedList
var b = [2,3].toSinglyLinkedList
a.add b
assert a.toSeq == [0,1,2,3]
assert b.toSeq == [] # with the rust behavior, ie no owned flag, just destroy b
a.add a  # this should give an error

less obvious example:

when true:
  import lists, sequtils
  var a = initDoublyLinkedList[int]()
  var n = newDoublyLinkedNode[int](0)
  a.append n
  a.append 1
  a.append 2
  var b = a
  b.remove(n)
  echo a
  echo b
  a.add b # would create a cycle

(note that with the "owned" flag, this would work (after a.add a, a would automatically be marked as owned))

remove can also create cycles:

this example uses a DoublyLinkedRing instead

  var a = initDoublyLinkedRing[int]()
  var n = newDoublyLinkedNode[int](0)
  a.append n
  a.append 1
  a.append 2
  var b = a
  b.remove(n)
  echo b
  echo a # hangs
salvipeter commented 3 years ago

I am not convinced about the owned solution for doubly linked lists. In the above example, the owned flag will be consistent, but the whole list will not be, because the invariant b.head.prev == nil is not satisfied.

As for handling a.add(a), I think that is also a case when the user should know better...

If someone would like to uses lists in a FP-like manner, he would need something like first and rest:

func first*[T](L: SinglyLinkedList[T]): T = L.head.value
func rest*[T](L: SinglyLinkedList[T]): SinglyLinkedList[T] =
  result = initSinglyLinkedList[T]()
  result.head = L.head.next
  result.tail = L.head.next

But then a.rest would also share structure with a, leading to similar cases. I think the solution is to don't treat these as problems.

Araq commented 3 years ago

We should follow Rust and move the elements.

timotheecour commented 3 years ago

We should follow Rust and move the elements.

indeed.

We also need a copy proc for a shallow copy of the list, for non-destructive append/prepend, eg:

var a = [0,1].toSinglyLinkedList
var b = [2,3].toSinglyLinkedList
# a.add b # ok but destructive for b
a.add b.copy # not destructive for b
assert a.toSeq == [0,1,2,3]
assert b.toSeq == [2,3]
a.add a.copy

note: copy is intentionally not deepCopy here. It's O(n) where n = num elements of the list

example:

type Foo = ref object
  x: int
var f = Foo(x: 1)
var a = [f, f].toSinglyLinkedList
var b = [f, f].toSinglyLinkedList
a.add b.copy
assert b.head.value == f
salvipeter commented 3 years ago

Implemented the move semantics, and added a shallow copy as well (both for both singly- and doubly linked lists). One thing that bothers me is that with the current implementation I can't write a.add(b.copy), because b.copy is immutable, I need to add another variable explicitly:

var
  a = [0, 1].toSinglyLinkedList
  b = [2, 3].toSinglyLinkedList
  c = b.copy
a.add(c)

Is there a workaround for this?

timotheecour commented 3 years ago

how about:

proc addMove[T: SinglyLinkedList | DoublyLinkedList](a: var T, b: var T) = ...
proc add[T: SinglyLinkedList | DoublyLinkedList](a: var T, b: T) =
  var b2 = b.copy
  addMove(a, b2)

ditto with prepend, prependMove

it might actually be better, because in rest of stdlib, add is not destructive.

bikeshedding: addMove or addFrom

Araq commented 3 years ago

Is there a workaround for this?

Use a sink parameter. And maybe overwrite the = operator for linked lists.

salvipeter commented 3 years ago

Use a sink parameter.

That's nice! Haven't heard of these before. So if the second list is a sink parameter, then even though add is implemented as a move, it is actually a copy when the list passed as second parameter is used afterwards. (Or, to be exact, when it cannot be proved that it is not used afterwards, right?)

I like this, but I don't really like the idea that this operation is O(1) or O(n) based on what the compiler thinks, unknown to the user.

Araq commented 3 years ago

You can make the = operator for lists an .error and then the copies cannot be implicit.

salvipeter commented 3 years ago

That sounds like a good solution, but when I actually tried it out at this simple example:

import lists
var a = [1, 2, 3].toSinglyLinkedList
let b = [4, 5].toSinglyLinkedList
a.add b.copy
assert $a == "[1, 2, 3, 4, 5]"
assert $b == "[4, 5]"
a.add b
assert $a == "[1, 2, 3, 4, 5, 4, 5]" # b is inaccessable

... I got an error for the a.add b line, saying that it's not the last read of b... (this was my whole file)

timotheecour commented 3 years ago

You can make the = operator for lists an .error and then the copies cannot be implicit.

but that would prevent valid use cases, and be a breaking change (not necessarily in the right direction).

import std/lists
type Foo = object
  list: DoublyLinkedList[int]
proc main()=
  block:
    var a = initDoublyLinkedList[int]()
    let b = a # was legal
  block:
    let a = Foo(list: initDoublyLinkedList[int]())
    let b = a.list # was legal
main()

(and {.byaddr.} still runs into the famous template bug https://github.com/nim-lang/Nim/issues/15920 so isn't always a workaround)

I do prefer the terminology addMove for the overload without copy because rest of stdlib assumes an add doesn't consume the RHS

I don't really like the idea that this operation is O(1) or O(n) based on what the compiler thinks, unknown to the user.

me neither, it's error prone

Araq commented 3 years ago

... I got an error for the a.add b line, saying that it's not the last read of b... (this was my whole file)

Because b is a global variable, the compiler doesn't try too hard to reason about global variables.

Araq commented 3 years ago

I do prefer the terminology addMove for the overload without copy because rest of stdlib assumes an add doesn't consume the RHS

That's true. But addMove isn't the best name. Maybe, addConsumed?

salvipeter commented 3 years ago

I have put it in as addMove for the time being... maybe addMoved is the best of both worlds?

timotheecour commented 3 years ago

addMoved is a bit shorter; any name that works well with both add and prepend is fine. so this would be:

salvipeter commented 3 years ago

Do we need prepend[Moved]? Its naming suggests that the alternative is append[Moved]. And a.prependMove b is essentially just

b.addMove a
swap a, b

(or a = b.addMove a, if you don't care about b being empty afterwards), while a.prepend b is

a = b.copy.addMove a
timotheecour commented 3 years ago

add is better than append, even if we later add prepend:

with add: this works out of the box with procs assuming some add, eg collect:

import lists, sugar
# it works, I tried it after s/append/add/ in your PR
let b = collect(initSinglyLinkedList): for i in 0..<3: i

the classical "works better in generic code" is a real argument ... and it doesn't mean that "everything compiles", just the stuff that makes sense ;-).

my suggestion is:

salvipeter commented 3 years ago

I was not really reasoning for append, more like reasoning against prepend... given that we have add, prepending is a trivial issue, as shown above. I don't consider prepending a basic operation.

timotheecour commented 3 years ago

arguments for prepend(a: list,b: list):

in any case this can be settled after your PR since it's orthogonal

Araq commented 3 years ago

the classical "works better in generic code" is a real argument ... and it doesn't mean that "everything compiles", just the stuff that makes sense ;-).

It is a real argument in this case because you gave a real example with collect that is convincing.

timotheecour commented 3 years ago

@salvipeter after https://github.com/nim-lang/Nim/pull/16362 (which I just LGTM'd) is merged, do you have interest in writing another PR for the following:

salvipeter commented 3 years ago

I've created a new PR.

timotheecour commented 3 years ago
timotheecour commented 3 years ago
when true:
  import lists
  block:
    var a = [10,11,12,13].toDoublyLinkedList
    var b = [20,21,22,23].toDoublyLinkedList
    a.remove(b.head)
    echo a # [10, 11, 12, 13]
    echo b # [20, 21, 22, 23]
  block:
    var a = [10,11,12,13].toDoublyLinkedList
    var b = [20,21,22,23].toDoublyLinkedList
    a.remove(b.head.next)
    echo a # [10, 11, 12, 13]
    echo b # [20, 22, 23]
timotheecour commented 3 years ago
when true:
  import lists
  var a = [10,11,12].toSinglyLinkedList
  a.add a.head.next
  echo take(a, 6)
current output: @[10, 11]
expected output: @[10, 11, 12, 11, 12, 11]  # segment followed by cycle

this would be consistent with behavior for a.add a which creates a cycle

(take is defined here: https://github.com/timotheecour/Nim/issues/504)

timotheecour commented 3 years ago

unlike with SinglyLinkedList (https://github.com/nim-lang/RFCs/issues/303#issuecomment-753417576), there's no way to have segment followed by cycle because pred(11) would need to have 2 values (12 and 10). So the simplest is to raise an error when n is such that n.prev != nil

If user wants to add a n with n.prev!=nil, we can discuss that; one possibility is to call split (refs https://github.com/nim-lang/Nim/pull/16536#issuecomment-753394613) or setting n.prev = nil first.

(take is defined here: https://github.com/timotheecour/Nim/issues/504)

timotheecour commented 3 years ago