Open timotheecour opened 3 years ago
While I can imagine your solution to work, I think it's too complex and I am not aware of other list implementations that use your solution.
See also: https://doc.rust-lang.org/std/collections/struct.LinkedList.html#method.append
This reuses all the nodes from other and moves them into self. After this operation, other becomes empty. This operation should compute in O(1) time and O(1) memory.
that's almost the same approach, in fact that was my original idea but then i figured keeping b
(but flagging it as "owned" (or "consumed")) is a bit more flexible. In either case you have a destructive add/prepend, but with "owned" you can still peek at the data that was appended.
Either way, either using an "owned" flag or destroying b would be a net improvement over https://github.com/nim-lang/Nim/pull/16362
(thanks for the link btw)
This won't solve the problem of DoublyLinkedLists
, because if you a.add(b)
and then c.add(b)
, the first node of b
cannot refer to the last node of a
and c
at the same time.
In my opinion, if someone uses list concatenation reasonably, these cases do not occur. If you create a cycle, you get an infinite loop, I think of that as a "fact of life" :) Adding a dirty bit, like the one proposed here, will result in many extra computations, and won't save the user if he does something like a.tail.next = a.head
.
But this is a good reason to add an outplace operator, as well.
Also, e.g. Common Lisp has a special variable that determines whether cycles are detected or not when printing. We could at least give a predicate to decide if a given list is circular.
I like the Rust solution, because it's so simple, although takes a bit of freedom away (sometimes it is good to have shared structures).
This won't solve the problem of DoublyLinkedLists, because if you a.add(b) and then c.add(b), the first node of b cannot refer to the last node of a and c at the same time.
after a.add b
, b is marked as owned, then when you call c.add(b)
c
will also be marked as owned
according to what i wrote above: if b.owned: a.owned = true else: b.owned = true
, to indicate c
is in an invalid state (ie that may contain cycles); this may or may not be what you want, but at least it's self consistent.
If destroying b
instead of using an owned flag (ie the rust approach), the semantics are instead making c.add(b)
a noop since b was set to empty after a.add(b)
; it's different, but also self consistent.
a.add a
or similarin rust:
fn main() {
use std::collections::LinkedList;
let mut list1 = LinkedList::new();
list1.push_back('a');
list1.push_back('b');
println!("{:?}", list1);
list1.append(&mut list1); // error
}
you (correctly) get an error:
error[E0499]: cannot borrow `list1` as mutable more than once at a time
--> /Users/timothee/git_clone//nim//timn//tests/nim/all/t11524b.rs:8:3
|
8 | list1.append(&mut list1);
| ^^^^^^------^----------^
| | | |
| | | first mutable borrow occurs here
| | first borrow later used by call
| second mutable borrow occurs here
The question is, how do we do the same for nim (in a robust way, not just this special case), with same O(1) complexity? https://github.com/nim-lang/Nim/pull/14976 would work but is there something simpler?
var a = [0,1].toSinglyLinkedList
var b = [2,3].toSinglyLinkedList
a.add b
assert a.toSeq == [0,1,2,3]
assert b.toSeq == [] # with the rust behavior, ie no owned flag, just destroy b
a.add a # this should give an error
less obvious example:
when true:
import lists, sequtils
var a = initDoublyLinkedList[int]()
var n = newDoublyLinkedNode[int](0)
a.append n
a.append 1
a.append 2
var b = a
b.remove(n)
echo a
echo b
a.add b # would create a cycle
(note that with the "owned" flag, this would work (after a.add a
, a would automatically be marked as owned))
this example uses a DoublyLinkedRing instead
var a = initDoublyLinkedRing[int]()
var n = newDoublyLinkedNode[int](0)
a.append n
a.append 1
a.append 2
var b = a
b.remove(n)
echo b
echo a # hangs
I am not convinced about the owned
solution for doubly linked lists. In the above example, the owned
flag will be consistent, but the whole list will not be, because the invariant b.head.prev == nil
is not satisfied.
As for handling a.add(a)
, I think that is also a case when the user should know better...
If someone would like to uses lists
in a FP-like manner, he would need something like first
and rest
:
func first*[T](L: SinglyLinkedList[T]): T = L.head.value
func rest*[T](L: SinglyLinkedList[T]): SinglyLinkedList[T] =
result = initSinglyLinkedList[T]()
result.head = L.head.next
result.tail = L.head.next
But then a.rest
would also share structure with a
, leading to similar cases. I think the solution is to don't treat these as problems.
We should follow Rust and move the elements.
We should follow Rust and move the elements.
indeed.
We also need a copy
proc for a shallow copy of the list, for non-destructive append/prepend, eg:
var a = [0,1].toSinglyLinkedList
var b = [2,3].toSinglyLinkedList
# a.add b # ok but destructive for b
a.add b.copy # not destructive for b
assert a.toSeq == [0,1,2,3]
assert b.toSeq == [2,3]
a.add a.copy
note: copy
is intentionally not deepCopy
here. It's O(n) where n = num elements of the list
example:
type Foo = ref object
x: int
var f = Foo(x: 1)
var a = [f, f].toSinglyLinkedList
var b = [f, f].toSinglyLinkedList
a.add b.copy
assert b.head.value == f
Implemented the move semantics, and added a shallow copy
as well (both for both singly- and doubly linked lists).
One thing that bothers me is that with the current implementation I can't write a.add(b.copy)
, because b.copy
is immutable, I need to add another variable explicitly:
var
a = [0, 1].toSinglyLinkedList
b = [2, 3].toSinglyLinkedList
c = b.copy
a.add(c)
Is there a workaround for this?
how about:
proc addMove[T: SinglyLinkedList | DoublyLinkedList](a: var T, b: var T) = ...
proc add[T: SinglyLinkedList | DoublyLinkedList](a: var T, b: T) =
var b2 = b.copy
addMove(a, b2)
ditto with prepend
, prependMove
it might actually be better, because in rest of stdlib, add is not destructive.
bikeshedding: addMove or addFrom
Is there a workaround for this?
Use a sink
parameter. And maybe overwrite the =
operator for linked lists.
Use a
sink
parameter.
That's nice! Haven't heard of these before. So if the second list is a sink parameter, then even though add
is implemented as a move, it is actually a copy when the list passed as second parameter is used afterwards. (Or, to be exact, when it cannot be proved that it is not used afterwards, right?)
I like this, but I don't really like the idea that this operation is O(1) or O(n) based on what the compiler thinks, unknown to the user.
You can make the =
operator for lists an .error
and then the copies cannot be implicit.
That sounds like a good solution, but when I actually tried it out at this simple example:
import lists
var a = [1, 2, 3].toSinglyLinkedList
let b = [4, 5].toSinglyLinkedList
a.add b.copy
assert $a == "[1, 2, 3, 4, 5]"
assert $b == "[4, 5]"
a.add b
assert $a == "[1, 2, 3, 4, 5, 4, 5]" # b is inaccessable
... I got an error for the a.add b
line, saying that it's not the last read of b
... (this was my whole file)
You can make the = operator for lists an .error and then the copies cannot be implicit.
but that would prevent valid use cases, and be a breaking change (not necessarily in the right direction).
import std/lists
type Foo = object
list: DoublyLinkedList[int]
proc main()=
block:
var a = initDoublyLinkedList[int]()
let b = a # was legal
block:
let a = Foo(list: initDoublyLinkedList[int]())
let b = a.list # was legal
main()
(and {.byaddr.}
still runs into the famous template bug https://github.com/nim-lang/Nim/issues/15920 so isn't always a workaround)
I do prefer the terminology addMove
for the overload without copy
because rest of stdlib assumes an add
doesn't consume the RHS
I don't really like the idea that this operation is O(1) or O(n) based on what the compiler thinks, unknown to the user.
me neither, it's error prone
... I got an error for the a.add b line, saying that it's not the last read of b... (this was my whole file)
Because b
is a global variable, the compiler doesn't try too hard to reason about global variables.
I do prefer the terminology addMove for the overload without copy because rest of stdlib assumes an add doesn't consume the RHS
That's true. But addMove
isn't the best name. Maybe, addConsumed
?
I have put it in as addMove
for the time being... maybe addMoved
is the best of both worlds?
addMoved is a bit shorter; any name that works well with both add and prepend is fine. so this would be:
Do we need prepend[Moved]
? Its naming suggests that the alternative is append[Moved]
.
And a.prependMove b
is essentially just
b.addMove a
swap a, b
(or a = b.addMove a
, if you don't care about b
being empty afterwards), while a.prepend b
is
a = b.copy.addMove a
add
is better than append
, even if we later add prepend
:
with add
: this works out of the box with procs assuming some add
, eg collect
:
import lists, sugar
# it works, I tried it after s/append/add/ in your PR
let b = collect(initSinglyLinkedList): for i in 0..<3: i
the classical "works better in generic code" is a real argument ... and it doesn't mean that "everything compiles", just the stuff that makes sense ;-).
my suggestion is:
don't modify pre-existing append
in this PR but use add
for your new procs in your PR
in future PR, change pre-existing append
to add
in lists.nim and add:
template append*[T: DoublyLinkedList or SinglyLinkedList](a: T, b: T) {.deprecated: "use add".} = add(a,b)
template append*[T](a: DoublyLinkedList[T] or SinglyLinkedList[T], b: T) {.deprecated: "use add".} = add(a,b)
(I also tried, seems to work; whereas {.deprecated: [append: add].}
hits another bug that really should be fixed)
add an entry to https://nim-lang.github.io/Nim/nep1.html specifying this preferred terminology (add + prepend instead of append + prepend)
I was not really reasoning for append
, more like reasoning against prepend
... given that we have add
, prepending is a trivial issue, as shown above. I don't consider prepending a basic operation.
arguments for prepend(a: list,b: list)
:
prepend(a: list: b: elem)
in lists.nim (https://nim-lang.github.io/Nim/lists.html#prepend%2CSinglyLinkedList%5BT%5D%2CSinglyLinkedNode%5BT%5D)add(a:list, b:elem)
and add(a: list, b: list)
)b.addMove a
swap a, b
in any case this can be settled after your PR since it's orthogonal
the classical "works better in generic code" is a real argument ... and it doesn't mean that "everything compiles", just the stuff that makes sense ;-).
It is a real argument in this case because you gave a real example with collect
that is convincing.
@salvipeter after https://github.com/nim-lang/Nim/pull/16362 (which I just LGTM'd) is merged, do you have interest in writing another PR for the following:
remove
for SinglyLinkedList
(analog to the one for DoublyLinkedList
)prepend
for SinglyLinkedList
and DoublyLinkedList
append
to add
(rationale was discussed in https://github.com/nim-lang/RFCs/issues/303#issuecomment-747787419) and add a deprecated append
for backward compatibilityI've created a new PR.
when true:
import lists
import std/enumerate
var a = [10,11,12,13].toDoublyLinkedList
echo a
a.remove(nil)
echo a
instead, we should use assert n!=nil
(and not doAssert
)
remove(a; DoublyLinkedList, n)
behavior when n notin a
is error-prone; I don't think it's possible to fix (because O(1) behavior should be kept) but it should be documented (it's not a "noop")when true:
import lists
block:
var a = [10,11,12,13].toDoublyLinkedList
var b = [20,21,22,23].toDoublyLinkedList
a.remove(b.head)
echo a # [10, 11, 12, 13]
echo b # [20, 21, 22, 23]
block:
var a = [10,11,12,13].toDoublyLinkedList
var b = [20,21,22,23].toDoublyLinkedList
a.remove(b.head.next)
echo a # [10, 11, 12, 13]
echo b # [20, 22, 23]
add(a: SinglyLinkedList, n: Node)
behavior when n in a
is bad, it should be as follows:when true:
import lists
var a = [10,11,12].toSinglyLinkedList
a.add a.head.next
echo take(a, 6)
current output: @[10, 11]
expected output: @[10, 11, 12, 11, 12, 11] # segment followed by cycle
this would be consistent with behavior for a.add a
which creates a cycle
(take is defined here: https://github.com/timotheecour/Nim/issues/504)
add(a: DoublyLinkedList, n: Node)
behavior when n in a
is bad, it should be as follows:
when true:
import lists
var a = [10,11,12].toDoublyLinkedList
a.add a.head.next
echo take(a, 6)
current output: @[10, 11]
expected output: raise `ValueError`
unlike with SinglyLinkedList
(https://github.com/nim-lang/RFCs/issues/303#issuecomment-753417576), there's no way to have segment followed by cycle
because pred(11) would need to have 2 values (12 and 10). So the simplest is to raise an error when n
is such that n.prev != nil
If user wants to add a n with n.prev!=nil
, we can discuss that; one possibility is to call split
(refs https://github.com/nim-lang/Nim/pull/16536#issuecomment-753394613) or setting n.prev = nil
first.
(take is defined here: https://github.com/timotheecour/Nim/issues/504)
current implementation in https://github.com/nim-lang/Nim/pull/16362 is problematic because:
But actually it turns out there's a similar problem with
SinglyLinkedList
:b
is left in inconsistent state after being added twice, and other problems below:proposal
This proposal involves adding a flag to indicate that a list is cycle-free assuming user sticks add, prepend + other APIS in lists.nim; overhead is negligible.
The proposal is to support destructive
add
andprepend
for bothSinglyLinkedList
andDoublyLinkedList
as follows:add a field
owned: bool
inSinglyLinkedList
implement
add
as follows:implement
prepend
in analogous way note that prepend is still needed even if we have add because prepend(a,b) will set b.owned but add(a,b) will set b.owned), so you can't simply implement prepend(a,b) in terms ofadd
as that would "own" aimplement
$
anditems
as follows:then this will make
$
, andtoSeq
correct instead of infinite loop on cyclesNote: the
owned
flag is added toSinglyLinkedList
, notSinglyLinkedNodeObj
, so it doesn't asymptotically increase memory for normal use (few lists, lots of nodes)DoublyLinkedList