Closed Araq closed 4 years ago
I'll give two examples from our codebase (out of many more):
Creating types such as BitsRange
and NibblesRange
that reference a backing openarray
and allow you to access the bits and nibbles of a sliceable range of memory. Such types are used when traversing Ethereum Patricia Tries for example. You can see some example code here.
Creating a stream type that can read/write a buffer of memory (i.e. openarray
) or a file, while providing you with a single interface for both.
If we have to generalise, let's see what's common between these examples - they both represent types that hide an underlying mutable pointer behind a safer high-level API. The features discussed here will enable you to define and instantiate such types on the stack in a way that will allow the compiler to ensure that memory-safety is preserved (something that our existing solutions don't solve).
The other major new capability will be using these types as return values. Iterators like strutils.split
will now return openarrays instead of strings, which will make a whole lot of code much more efficient (less memory allocations, less copying).
And finally, C++ programmers will have less reasons to complain when they discover that Nim doesn't allow them to use the following familiar pattern for avoiding copies when using container accessors:
const SomeType& foo = someVector[i];
Work-arounds with templates or unsafe usages of addr
will be reduced.
A borrowed type cannot be used for (local or global or threadlocal) variables or consts. It must be used as a parameter or return type.
There is no need to require that borrowed types must not be used for local variables. In fact, most of the benefits will be gained when this becomes possible. Here is how you can lift this restriction:
There are only two ways a borrowed value can be created:
It can be received as an input parameter (please note that besides var
and openarray
this includes the regular read-only non-ref parameters).
It can be obtained as a return value from a function.
Let's focus on option 2. Consider code like the following:
proc someProc(borrowedFoo: Foo) =
let borrowedBarSlice = getBarSlice(borrowedFoo)
useSlice(borrowedBarSlice)
We can safely rewrite it like this:
proc someProc_continuation(barSlice: openarray[Bar]) =
useSlice(barSlice)
proc someProc(borrowedFoo: Foo) =
someProc_continuation(getBarSlice(borrowedFoo))
In other words, we can treat everything in the basic block that follows the borrowing of a return value as a new function taking the borrowed value as an extra input.
Where the danger lies is in invalidating the borrowed values. Consider the following classic example:
proc dangerous(x: var seq[Foo]) =
let elementReference: lent Foo = x[10]
x.setLen 0 # The memory backing the foo reference is destroyed
# Let's assume that this will be devastating for the next line.
echo elementReference.someField
Even this example can be rewritten to have the same hazards in the current version of Nim:
proc dangerous_continuation(elementReference: Foo, x: var seq[Foo]) =
x.setLen 0 # The memory backing the foo reference is destroyed.
# Let's assume that this will be devastating for the next line.
echo elementReference.someField
proc dangerous(x: var seq[Foo]) =
dangerous_continuation(x[10], x)
This goes to show that our continuation transformation goes in both directions - any unsafe code can be expressed with or without the local variable restriction. The solution here is to introduce a more powerful borrow checker that will enforce that every mutable alias to a particular "backing resource" is unique. You can either have one mutable alias or multiple read-only ones. The compiler write tracking can help you enforce this on a more granular level. The notions of "backing resource" and a "pointer to it" can be generalized to allow the user to introduce custom user-defined types following the same rules.
Even this example can be rewritten to have the same hazards in the current version of Nim
This problem is actually covered in https://nim-lang.org/docs/manual_experimental.html#aliasing-restrictions-in-parameter-passing and can be solved differently: If the compiler cannot prove there is no dangerous aliasing involved, it can pass a real copy to the Foo
value parameter.
This would also allow to implement C++/D ranges without resorting to raw pointers and ensuring that they don't escape.
We could rewrite a couple of algorithms in strutils and sequtils in way that is ergonomic (functional-like), composable but without the overhead of intermediate allocation and without the need of the It
templates like mapIt
.
In the Nimbus codebase, everywhere we use this code https://github.com/status-im/nim-stew/blob/master/stew/ranges/memranges.nim was because we couldn't use openarray as values.
Obviously there are all the existing RFCs that must not be forgotten:
This problem is actually covered in https://nim-lang.org/docs/manual_experimental.html#aliasing-restrictions-in-parameter-passing and can be solved differently: If the compiler cannot prove there is no dangerous aliasing involved, it can pass a real copy to the Foo value parameter.
Good to see that spec, but my example was attempting to demonstrate that there is equivalence between the problems. To enforce the referenced spec and to lift the "no local variables" restriction, you have to solve the same problem. The implicit copying work-around can be applied in both cases and changing my example to use var
creates code that must be outlawed in both cases:
proc dangerous_continuation(elementReference: var Foo, x: var seq[Foo]) =
x.setLen 0 # The memory backing the foo reference is destroyed.
# Let's assume that this will be devastating for the next line.
echo elementReference.someField
proc dangerous(x: var seq[Foo]) =
dangerous_continuation(x[10], x)
I have another example. A while ago I had a discussion with @krux02 about the interface code he wrote, which I later masacred to use gc refs, to make it safe. This change would make this uncessary (maybe not always), because it can be ensured that the interface object doesn't outlive the implementing object.
I have one more. What this RFC should allow is for closures to capture lent T / var T
variables by borrowing without copy. The same way as objects
will have borrowed fields, the closures would have borrowed fields too and and it makes them borrowed as well.
Currently, lent T / var T
can't be captured and users complain about lack of this functionality.
A practical example of something that is not possible today:
I want to write a big integer library
Typical usecase: as a backend of a cryptographic library for example TLS, Blockchain or Zero-Knowledge Proofs. I expect zero-knowledge proofs to become as important as TLS in the future and there are only few libraries and very few languages are suitable to write those in.
The resulting BigInt type would be
type BigInt = object
limbs: openarray[uint64]
And for cryptography/modular bigint the type would be
type BigInt = object
bitSize: uint64
limbs: openarray[uint64]
Currently we can use static, proof-of-concept in Constantine: https://github.com/mratsim/constantine/blob/ff8b22e1d179844a8b73df252b19c2b7449210e1/constantine/bigints.nim#L42-L74
type Word* = uint64
const WordBitSize* = sizeof(Word) * 8 - 1
## Limbs are using only 63-bit to avoid carries
func wordsRequired(bits: int): int {.compileTime.}=
(bits + WordBitSize - 1) div WordBitSize
type
BigInt*[bits: static int] = object
## Fixed-precision big integer
# The word size is 63-bit to avoid carries
# "Limb-endianess" is little-endian (least significant limb at BigInt.limbs[0])
limbs*: array[bits.wordsRequired, Word]
However due to the use of static
, each operations like add
, ==
will create duplicated code for each bits
value in use.
There is a tradeoff here.
However is very probably that having the following data structure
type BigInt = object
bitSize: uint64
limbs: openarray[uint64]
which avoids code monomorphization at the expense of 1 (bitSize) or 2 words (openarray.len) will lead to less total size (library size + runtime size).
For this use-case:
{.push boundChecks:off.}
is compatible with {.push raises:[].}
to have compiler guaranteesThe length
part of the openarray is actually unneeded as we already have to track bitSize for modular big integers. An alternative way would be to associate a lifetime with the pointer field somehow but we go deep into borrow-checker territory and is unnecessary in the first iteration i.e. something like this:
type LifeTime = object
type BigInt[x: LifeTime] = object
bitSize: uint64
limbs: bound(x, ptr UncheckedArray[uint64])
but maybe lent
captures would work?
Range-like transformations similar to D-range or C++20-ranges
In particular this enables functional-style (map, filter, fold, reduce, take, drop, ...) including on infinite ranges without:
requiring closure iterators which allocate on the heap and are a bit unwieldly to use for example
proc takeWhile*[T](iter: iterator(): T, cond: proc(x: T):bool): iterator(): T =
## .. code-block:: Nim
## takeWhile(1;2;3;4, proc(x: int): bool = x < 4) -> 1;2;3
result = iterator(): T {.closure.}=
var r = iter()
while not finished(iter) and cond(r):
yield r
r = iter()
proc takeWhile*[T](iter: openarray[T], cond: proc(x: T):bool): iterator(): T =
## Is this even valid or do we need seq[T]
result = iterator(): T {.closure.}=
for val in iter:
if not cond(val):
break
yield val
Note that with such ranges, we wouldn't have to build a state machine at compile-time like Rust or zero-functional. You would be able to create transformations at runtime as well, for example for a parser constructed from a json or protobuf definition.
Furthermore, we completely dodge the need of loop fusion or deforestation that seem so prevalent in functional languages because there is no intermediate values by design.
strutils
fast (and still composable)Currently strutils
is a huge performance trap. Nim beginners are using it, allocating all over the place thanks to the easy but allocation-heavy design.
And then they come in the forum saying that Nim is slower than Python, someone says "you shouldn't do that, allocation is bad, etc do it this way or use npeg".
We need a way to easily compose string operations and allocate only when necessary. Having openarray as values would probably enable a library with a strutils like API but much higher performance so that string processing in Nim is not such a performance trap.
That is exactly the opposite of what I asked for, sorry. I asked for example that would actually really work with the outlined borrowing rules. Hint: Borrowing means you have fewer owners, not no owners whatsoever, so I cannot see your BigInt example working out.
I don't get what you mean. There is an owner, the proc that allocates the backend buffers.
For example in pseudo-code:
type BigInt = object
# Internal type -> openarray to guarantee lifetime
bitSize: uint64
limbs: openarray[uint64]
type Curve* = enum
Curve25519
BN254
BLS12_384
Secp256k1
func getNumLimbs(C: static Curve): int =
discard "map curve -> # of bigint limbs"
func getSignatureSize(C: static Curve): int =
discard "map curve -> signature size"
type FiniteField[N: static int, C: static Curve] = object
## Extension Field: all arithmetic operations are done modulo a constant prime / polynomial extension
# Internal type -> openarray to guarantee lifetime
coefs: array[N, BigInt] # Polynomial coordinates in the extension field
type ECCPointG2[C: static Curve] = object
## Elliptic Curve coordinates on G2 curve
# Internal type -> openarray to guarantee lifetime
x: FiniteField[2, C]
y: FiniteField[2, C]
type Signature*[C: static Curve] = object
## Public type
raw: array[getSignatureSize(C), byte]
proc hashToEllipticCurveG2(point: var ECCPointG2, msg: openarray[byte]) =
discard "computation"
proc compressSignature(point: ECCPointG2): Signature =
discard "compression"
proc signMsg(msg: openarray[byte], Curve: static enum): Signature[Curve] =
var g2x_fp2x, g2x_fp2y, g2y_fp2x, g2y_fp2y: array[getNumLimbs(C), uint64]
var point = ECCPointG2(
x: FiniteField[2, C](coord: [g2x_fp2x, g2x_fp2y]),
y: FiniteField[2, C](coord: [g2y_fp2x, g2y_fp2y])
)
# This proc is the owner of all computation buffers
# We ensure that all computations are done without allocation
# This includes bigints, modular bigints and elliptic curve maths
# and that the temporary buffers don't escape
point.hashToEllipticCurveG2(msg)
result = point.compressSignature()
Ok, thank you. This example is precise enough that I can see my somewhat simplistic borrow rules would work for you.
Other simple things that would be allowed having first-class openarrays:
For instance I would benefit from such for some cryptographic APIs:
func aggregateVerify*[T: char or byte](
publicKeys: openarray[PublicKey],
messages: openarray[openarray[T]],
signature: Signature): bool =
## Check an aggregated signature over several (publickey, message) pairs
## returns `true` if the signature is valid, `false` otherwise.
discard
func aggregateVerify*[T: char or byte](
publicKey_msg_pairs: openarray[tuple[publicKey: PublicKey, message: openarray[T]]],
signature: Signature): bool =
## Check an aggregated signature over several (publickey, message) pairs
## returns `true` if the signature is valid, `false` otherwise.
discard
This RFC has been implemented. View-types have a reasonably precise specification. View types are far more flexible than what was outlined in this RFC. The implementation is reasonably stable, they are used successfully in the wild, see for example https://forum.nim-lang.org/t/6968#43685
The found memory-safety holes (that are affecting view types) are covered by the section "Aliasing restrictions in parameter passing" here. We hope to have an implementation for this soon, but this should be covered by a milestone of its own, as it's an existing, well-known problem for an Ada-like language. (With an elegant solution too, which I didn't outline in the experimental manual.)
Is there documentation of the view types as implemented?
Is there documentation of the view types as implemented?
It's in the experimental manual
Nim currently has 3 types that are second-class citizens due to memory safety reasons:
var T
,lent T
,openArray[T]
. These can only be used in parameter lists as the underlying implementation is that of a borrowed pointer that must not outlive the callee. See https://nim-lang.org/docs/manual.html#procedures-var-return-type for details about how this is done forvar T
.This proposal is about extending the capabilities of these types so that they are allowed inside other types like objects or tuples. A type containing a borrowed pointer (
var T
/lent T
/openArray
) becomes borrowed too and a value of such a type must not outlive the location it was borrowed from. This is enforced statically. How exactly this static enforcement will work remains to be seen, but the rules of the first implementation will be:from
declaration.In other words, exactly the same rules apply that apply for https://nim-lang.org/docs/manual.html#procedures-var-return-type.
ToDo: Show what idioms this extension allows for. This RFC is in desparate need of practical examples.