Open Alogani opened 7 months ago
Well but handling cycles properly is kinda part of the purpose of a deepcopy operation.
Oh, I understand, thanks for your response 😔
Hello @Araq,
You are right, handling data cyclic structures has a crucial importance. I thought of a solution that might fits
I have micro benchmarked it compared to the acyclic version I proposed, there is a non negligible performance hit (>35% slower). So it might be worth to add a special bool value to tell the deepcopy the structure is acyclic (or cyclic). Or better, get the information directly from the compiler/type system (if it keeps it)
proc deepCopyImpl[T](dest: var T; src: T, alreadyCopied: var seq[pointer]) =
# An optimisation could be made if it is possible to have the info about if object is acyclic (compiler knows it ?)
when typeof(src) is ref or typeof(src) is ptr:
if src != nil:
let srcPtr = cast[pointer](src)
if srcPtr in alreadyCopied:
dest = src
else:
dest = T()
alreadyCopied.add srcPtr
dest[] = src[]
deepCopyImpl(dest[], src[], alreadyCopied)
elif typeof(src) is object:
for _, v1, v2 in fieldPairs(dest, src):
deepCopyImpl(v1, v2, alreadyCopied)
proc deepCopy*[T](dest: var T; src: T) =
## This procedure copies values and create new object for references
## Also copies pointers, so unmanaged memory is unsafe if pointer is not wrapped into an object with a destructor
# Should behave exactly like system.deepCopy
dest = src
var alreadyCopied: seq[pointer]
deepCopyImpl(dest, src, alreadyCopied)
However comparison is always a bit more complicated, so there is certainly corner cases where it might break. I still include it, because it's just so nice to test if deepCopy worked.
func deepEqualImpl[T](a, b: T, alreadyCompared: var seq[(pointer, pointer)]): bool =
result = true
when typeof(a) is ref or typeof(a) is ptr:
if a == nil or b == nil:
result = a == b
else:
let ptrTuple = (cast[pointer](a), cast[pointer](b))
if ptrTuple in alreadyCompared:
result = true
else:
alreadyCompared.add ptrTuple
result = deepEqualImpl(a[], b[], alreadyCompared)
elif typeof(a) is object:
for name, v1, v2 in fieldPairs(a, b):
result = deepEqualImpl(v1, v2, alreadyCompared)
if result == false:
return
else:
result = a == b
func deepEqual*[T](a, b: T): bool =
## Check values recursively
## Ignore References/adresse inequality
var alreadyCompared: seq[(pointer, pointer)]
deepEqualImpl(a, b, alreadyCompared)
runTest("Cyclic data structure"):
type Node = ref object
next: Node
data: int
var a = Node(data: 1)
var b = Node(data: 2)
var c = Node(data: 3)
a.next = b
b.next = c
c.next = a
var aCopy: Node
deep.deepCopy(aCopy, a)
doAssert deep.deepEqual(aCopy, a) == true
doAssert aCopy.data == a.data
doAssert aCopy.next.data == a.next.data
doAssert aCopy.next.next.data == a.next.next.data
doAssert aCopy.next.next.next.data == a.next.next.next.data
aCopy.data = 42
doAssert deep.deepEqual(aCopy, a) == false
ACyclic data structure
>= BENCH system.deepCopy
>-- PASS: 42.052 op/ms | 23.780 μs/op
>= BENCH proposal.deepCopy
>-- PASS: 130.32 op/ms | 7.6733 μs/op
Cyclic data structure
>= BENCH system.deepCopy
>-- PASS: 306.02 op/ms | 3.2678 μs/op
>= BENCH personal.deepCopy
>-- PASS: 375.42 op/ms | 2.6637 μs/op
Old version :
>= BENCH system.deepCopy
>-- PASS: 44.022 op/ms | 22.716 μs/op
>= BENCH proposal.deepCopy
>-- PASS: 178.40 op/ms | 5.6055 μs/op
Because it uses fieldpairs, it can get and copy both private an public fields. This is intended.
If you really care about performance you'd change the seq
to HashSet
, but now it depends on std/sets
.
Hello @beef331,
If I propose something for Nim, my care is not the most important. "Its design focuses on efficiency, expressiveness, and elegance (in that order of priority)." I'm no senior developper, so I have never said it was perfect, it's a proposition :smile:
You are right, hashset is quickly more efficient (more than 10 refs), I haven't thought about it.
Here is the updated proposition :
import std/sets
func deepEqualImpl[T](a, b: T, alreadyCompared: var HashSet[(pointer, pointer)]): bool =
result = true
when typeof(a) is ref or typeof(a) is ptr:
if a == nil or b == nil:
result = a == b
else:
let ptrTuple = (cast[pointer](a), cast[pointer](b))
if ptrTuple in alreadyCompared:
result = true
else:
alreadyCompared.incl ptrTuple
result = deepEqualImpl(a[], b[], alreadyCompared)
elif typeof(a) is object:
for name, v1, v2 in fieldPairs(a, b):
result = deepEqualImpl(v1, v2, alreadyCompared)
if result == false:
return
else:
result = a == b
func deepEqual*[T](a, b: T): bool =
## Check values recursively
## Ignore References/adresse inequality
var alreadyCompared: HashSet[(pointer, pointer)]
deepEqualImpl(a, b, alreadyCompared)
proc deepCopyImpl[T](dest: var T; src: T, alreadyCopied: var HashSet[pointer]) =
# An optimisation could be made if it is possible to have the info about if object is acyclic (compiler knows it ?)
when typeof(src) is ref or typeof(src) is ptr:
if src != nil:
let srcPtr = cast[pointer](src)
if srcPtr in alreadyCopied:
dest = src
else:
dest = T()
alreadyCopied.incl srcPtr
dest[] = src[]
deepCopyImpl(dest[], src[], alreadyCopied)
elif typeof(src) is object:
for _, v1, v2 in fieldPairs(dest, src):
deepCopyImpl(v1, v2, alreadyCopied)
proc deepCopy*[T](dest: var T; src: T) =
## This procedure copies values and create new object for references
## Also copies pointers, so unmanaged memory is unsafe if pointer is not wrapped into an object with a destructor
# Should behave exactly like system.deepCopy
dest = src
var alreadyCopied: HashSet[pointer]
deepCopyImpl(dest, src, alreadyCopied)
JS backend also needs a deepcopy implementation https://github.com/nim-lang/Nim/issues/4631
@ringabout Sorry, I don't know js enough ! So I won't try to implement it
Does javascript already has a deepcopy feature ? https://stackoverflow.com/questions/122102/what-is-the-most-efficient-way-to-deep-clone-an-object-in-javascript
Hello everyone,
I've worked a bit more on deepCopy/deepEqual implementation. I have written stronger tests (3 of them fails with system.deepCopy, because they involved typed ptr). And made some correction on types I haven't tested before. I have notably switched the alreadyCopied variable from a HashSet to a Table to handle more complex cyclic cases.
However I fail on https://github.com/nim-lang/Nim/blob/version-2-0/tests/system/tdeepcopy.nim, because ARC release too early some linked list data (with .cursor. pragma), causing a nil deference. And i really don't understand why. (I could obliviously pass if I assume the same ref object can appear only once in a data structure, but I don't think it's good). However I struggle to find where precisely it goes wrong, so that to find a good solution.
If someone more competent on arc memory model internals regarding acyclic data structures could help me, it would be great. Otherwise, I won't go much farther, it's already quite solid and efficient for my needs, and also surely for some other peoples (and I have already spent too much time working on it !)
import std/[tables, sets]
## Look at "## Here is the code causing issue"
func deepEqualImpl[T](a, b: T, alreadyCompared: var HashSet[(pointer, pointer)]): bool =
result = true
when a is seq or a is array:
for i in 0..high(a):
if not deepEqualImpl(a[i], b[i], alreadyCompared):
return false
elif a is cstringArray:
# Must be nil terminated
var L = 0
while a[L] != nil: inc(L)
let rawSize = L * sizeof(cstring)
if cmpMem(a, b, rawSize) != 0:
return false
elif a is pointer:
## Don't be ambigous even if both have same addr or are nil
error("Can't compare pointer of unknown size")
elif a is ref or a is ptr:
if a == nil or b == nil:
return a == b
else:
let ptrTuple = (cast[pointer](a), cast[pointer](b))
if ptrTuple in alreadyCompared:
return true
else:
alreadyCompared.incl ptrTuple
result = deepEqualImpl(a[], b[], alreadyCompared)
elif a is object or a is tuple:
for name, v1, v2 in fieldPairs(a, b):
if not deepEqualImpl(v1, v2, alreadyCompared):
return false
else:
return a == b
func deepEqual*[T](a, b: T): bool =
## Check values recursively
## Ignore References/adresse inequality
var alreadyCompared: HashSet[(pointer, pointer)]
deepEqualImpl(a, b, alreadyCompared)
proc deepCopyImpl[T](dest: var T; src: T, alreadyCopied: var Table[pointer, pointer]) =
# An optimisation could be made if it is possible to have the info about if object is acyclic (compiler knows it ?)
when src is seq or src is array:
dest = src
for i in 0..high(src):
deepCopyImpl(dest[i], src[i], alreadyCopied)
elif src is cstringArray:
# Must be nil terminated
var L = 0
while src[L] != nil: inc(L)
let rawSize = L * sizeof(cstring)
dest = cast[cstringArray](alloc(rawSize))
copyMem(dest, src, rawSize)
elif src is pointer:
error("Can't copy pointer of unknown size")
elif src is ref or src is ptr:
if src != nil:
let
srcPtr = cast[pointer](src)
associatedDest = alreadyCopied.getOrDefault(srcPtr, nil)
if associatedDest != nil:
when src is ptr:
dest = cast[T](associatedDest)
else:
## Here is the code causing issue
dest = new T
dest[] = cast[T](associatedDest)[]
## End of portion causing issue
else:
when src is ptr:
dest = cast[T](alloc(sizeof(src)))
else:
dest = new T
alreadyCopied[srcPtr] = cast[pointer](dest)
dest[] = src[]
deepCopyImpl(dest[], src[], alreadyCopied)
elif src is object or src is tuple:
for _, v1, v2 in fieldPairs(dest, src):
deepCopyImpl(v1, v2, alreadyCopied)
else:
# cstring is copied here
dest = src
proc deepCopy*[T](dest: var T; src: T) =
## This procedure copies values and create new object for references
## Also copies typed pointer, so unmanaged memory is unsafe if pointer is not wrapped into an object with a destructor
# Should behave exactly like system.deepCopy
dest = src
var alreadyCopied: Table[pointer, pointer]
deepCopyImpl(dest, src, alreadyCopied)
import ../src/aloganimisc/deep
import std/[sets, strtabs, tables, sequtils]
import std/unittest
var initIncrement = 1
test "Acyclic data structure":
type
TestEnum = enum
AEnum, BEnum, CEnum
TestRef = ref object
field1: int
TestObj = object
refField: TestRef
MainRef = ref object
intField: int
strField: string
cstrField: cstring
tupleField: (TestObj, string)
setField: set[TestEnum]
objField: TestObj
refField: TestRef
nilField: TestRef
arrayField: array[1, TestObj]
seqField: seq[TestObj]
strtabField: StringTableRef
tableField: Table[string, TestRef]
stringSeq: seq[string]
cstrArray: cstringArray
nilPtr: TestRef
intRef: ref int
intPtr: ptr int
strRef: ref string
strPtr: ptr string
proc new(T: type TestRef): T =
result = TestRef(field1: initIncrement)
initIncrement += 1
proc init(T: type TestObj): T =
result = TestObj(refField: TestRef.new())
proc new(T: type MainRef): T =
var
strVal = "Hello"
stringSeq = @["World"]
intRef = new int
strRef = new string
intRef[] = 5
strRef[] = "from nim"
result = MainRef(
intField: 42,
strField: strVal,
cstrField: strVal.cstring,
tupleField: (TestObj.init(), "tuple"),
setField: { AEnum, BEnum },
objField: TestObj.init(),
refField: TestRef.new(),
nilField: nil,
arrayField: [TestObj.init()],
seqField: @[TestObj.init()],
strtabField: {"KEY": "VALUE"}.newStringTable(),
tableField: {"KEY": TestRef.new()}.toTable(),
stringSeq: stringSeq,
cstrArray: allocCStringArray(stringSeq),
nilPtr: nil,
intRef: intRef,
intPtr: cast[ptr int](intRef),
strRef: strRef,
strPtr: cast[ptr string](strRef),
)
proc compare(a, b: TestRef) =
check a != b
check a.field1 == b.field1
proc compare(a, b: TestObj) =
check a != b # Because contains a ref field
compare(a.refField, b.refField)
proc compare(a, b: MainRef) =
var key: string
check a.intField == b.intField
check addr(a.strField) != addr(b.strField)
check a.strField == b.strField
check addr(a.cstrField) != addr(b.cstrField)
check a.cstrField == b.cstrField
compare(a.tupleField[0], b.tupleField[0])
check a.tupleField[1] == b.tupleField[1]
check a.setField == b.setField
compare(a.objField, b.objField)
compare(a.refField, b.refField)
check a.nilField == nil
check b.nilField == nil
compare(a.arrayField[0], b.arrayField[0])
compare(a.seqField[0], b.seqField[0])
check a.strtabField.len() > 0
key = a.strtabField.keys().toSeq()[0]
check a.strtabField[key] == b.strtabField[key]
check a.tableField.len() > 0
key = a.tableField.keys().toSeq()[0]
check a.tableField[key] != b.tableField[key]
check a.cstrArray != b.cstrArray # Fail with system.deepCopy(). Not copied ?
check a.cstrArray[0] == b.cstrArray[0]
check a.nilPtr == nil
check b.nilPtr == nil
check a.intRef != b.intRef
check a.intRef[] == b.intRef[]
check cast[int](a.intPtr) != cast[int](b.intPtr) # Fail with system.deepCopy() -> same address
check a.intPtr[] == b.intPtr[]
check a.strRef != b.strRef
check a.strRef[] == b.strRef[]
check cast[int](a.strPtr) != cast[int](b.strPtr) # Fail with system.deepCopy() -> same address
check a.strPtr[] == b.strPtr[]
let a = MainRef.new()
var b: MainRef
when false: # use --deepcopy:on and set to true
runBench("system"):
system.deepCopy(b, a)
compare(a, b)
b = nil
deep.deepCopy(b, a)
compare(a, b)
check deep.deepEqual(a, b)
test "Cyclic data structure":
type Node = ref object
last {.cursor.}: Node
next: Node
data: int
proc main() =
for i in 0..100:
var a = Node()
a.last = a
a.next = a
for i in 0..10:
var n = Node()
a.last.next = n
n.next = a
var aCopy: Node
deep.deepCopy(aCopy, a)
check aCopy.data == a.data
check aCopy.next.data == a.next.data
check aCopy.next.next.data == a.next.next.data
check aCopy.next.next.next.data == a.next.next.next.data
check deep.deepEqual(aCopy, a)
aCopy.data = 42
check not deep.deepEqual(aCopy, a)
main()
#[
## From https://github.com/nim-lang/Nim/blob/version-2-0/tests/system/tdeepcopy.nim
discard """
matrix: "--mm:refc; --mm:orc --deepcopy:on"
output: "ok"
"""
import lists #tables, lists
type
ListTable[K, V] = object
valList: DoublyLinkedList[V]
table: Table[K, DoublyLinkedNode[V]]
ListTableRef*[K, V] = ref ListTable[K, V]
proc initListTable*[K, V](initialSize = 64): ListTable[K, V] =
result.valList = initDoublyLinkedList[V]()
result.table = initTable[K, DoublyLinkedNode[V]]()
proc newListTable*[K, V](initialSize = 64): ListTableRef[K, V] =
new(result)
result[] = initListTable[K, V](initialSize)
proc `[]=`*[K, V](t: var ListTable[K, V], key: K, val: V) =
if key in t.table:
t.table[key].value = val
else:
let node = newDoublyLinkedNode(val)
t.valList.append(node)
t.table[key] = node
proc `[]`*[K, V](t: ListTable[K, V], key: K): var V {.inline.} =
result = t.table[key].value
proc len*[K, V](t: ListTable[K, V]): Natural {.inline.} =
result = t.table.len
iterator values*[K, V](t: ListTable[K, V]): V =
for val in t.valList.items():
yield val
proc `[]=`*[K, V](t: ListTableRef[K, V], key: K, val: V) =
t[][key] = val
proc `[]`*[K, V](t: ListTableRef[K, V], key: K): var V {.inline.} =
t[][key]
proc len*[K, V](t: ListTableRef[K, V]): Natural {.inline.} =
t[].len
iterator values*[K, V](t: ListTableRef[K, V]): V =
for val in t[].values:
yield val
proc main() =
type SomeObj = ref object
for outer in 0..1:
let myObj = new(SomeObj)
let table = newListTable[int, SomeObj]()
table[0] = myObj
for i in 1..10_000:
table[i] = new(SomeObj)
var myObj2: SomeObj
for val in table.values():
if myObj2.isNil:
myObj2 = val
doAssert(myObj == myObj2) # passes
#doAssert deepEqual(myObj, myObj2)
var tableCopy: ListTableRef[int, SomeObj]
deep.deepCopy(tableCopy, table) # -> fail here
let myObjCopy = tableCopy[0]
var myObjCopy2: SomeObj = nil
for val in tableCopy.values():
if myObjCopy2.isNil:
myObjCopy2 = val
#echo cast[int](myObj)
#echo cast[int](myObjCopy)
#echo cast[int](myObjCopy2)
doAssert(myObjCopy == myObjCopy2) # passes
#doAssert deepEqual(myObjCopy, myObjCopy2)
main()
echo "ok"
]#
Both system and this deepCopy
have a peculiar behaviour when dealing with dynamic libraries. I'm implementing a hot reload workflow that copies some state over from one generation of the library to another. I noticed that when the copied structures contain strings, accessing these strings after the old version of the library was unloaded causes SIGSEGV: Illegal storage access. (Attempt to read from nil?)
. Fortunately, I was able to workaround this problem by adding a quick hack to the implementation presented in the message above:
elif src is string:
dest = "" & src
I'd appreciate any advice on how to fix that problem properly. For the reference, I'm using Nim v2.0.4 with --mm:orc
Your workaround is very smart and ORC does not copy string literals assuming that the process lives forever which is not true for your use case. My advice: Add a flag to deepCopy
like fromForeignProcess: bool
to document clearly what is going on.
In other words, I think there is no way to fix that "problem properly" and even cstring
string literals are affected.
Summary
Hello,
I propose a deepcopy implementation using a fieldPairs iterators. It seems to work for all types and is 3 times quicker than the actual one. It has not been tested on cyclic data structures.
Here is the code below :
Here is also the code for a deepEqual checking implementation :
Here are some unittests :
Description
The actual deepcopy implementation is slow and require a supplementary compile flag.
Alternatives
No response
Examples
No response
Backwards Compatibility
No response
Links
No response