Closed sinkingsugar closed 3 years ago
for casting, see https://github.com/status-im/nim-stew/blob/50562b515a771cfc443557ee8e2dceee59207d52/stew/byteutils.nim#L127 - ie the idea is to transition them into function calls so that we can do something sane about it in the future - the conversion in the other direction might still be missing though - for now, like nim does, we assume a neutral world of byte sequences and no special meaning for utf8 etc.
a core issue is that string usage is prolific in nim even when byte sequence would be more appropriate - again, the idea is that we slowly clean that up by creating alternatives in stew
so that we can test what a role model solution would look like in practise.
Yeah that's better (the toBytes) I guess, at least it's in a place that can be localized and triaged if something happens. Still it's far from safe, I think the GC references (on the backing memory) likely get increased/decreased/messed so far from noop
.
But the cast[string]
for example doubt would work with arc
or strv2
https://github.com/nim-lang/Nim/blob/7ec7731f824b933cdd4f4a1f816c1f3e4862bef6/lib/system/strs_v2.nim#L20
As length is before the payload (and payload cap is before etc..etc..)!
Completely true and agree on the string usage that randomly pops :), something like rust &[u8]
a slice or basically a string view would be nice.
afair the cast still has value semantics so there's a copy made, but feel free to correct me (and the comment on that function, maybe) - re arc, no idea, but getting rid of the casts is small step in a reasonable direction that works today.
Ah very true, you are completely right, there is likely a copy indeed! So basically no point on cast
, just implementing a clean copy is basically the same and future proof!
Most code accepting read-only inputs should prefer using openarray
(a.k.a. slice/view/span). Strings offer toOpenArrayByte
which is a safe and legal way to get the bytes of a string. The current issue is that you cannot use openarray parameters in async functions.
Casting a string
to seq[byte]
is OK.
Casting a seq[byte]
to string
is definitely not OK.
The strings created by casting are not terminated by '\0'
and will cause buffer overflow with C APIs.
This already happened in https://github.com/status-im/nim-http-utils/issues/8 and fixed in https://github.com/status-im/nim-http-utils/pull/9 by using the following procedure instead of cast:
proc toString(data: openarray[byte], start, stop: int): string =
## Slice a raw data blob into a string
## This is an inclusive slice
## The output string is nul-terminated for raw C-compat
let len = stop - start + 1
result = newString(len)
copyMem(result[0].addr, data[start].unsafeAddr, len)
It has the same performance properties as casting for lvalues, for rvalues casting would avoid the copy but openarray would work as well.
If casting to string is desired, it should be a type NotNulString = distinct string
to ensure it is never used as a public string or passed to cstring / C-API
Cool, was thinking to add something like that @mratsim , maybe can be added into stew
or something like that!
@zah Hmm, I didn't use async so much lately, is that a limitation in the current implementation for some reason?
The strings created by casting are not terminated by '\0' and will cause buffer overflow with C APIs.
Not implicitly AFAIK, but if converting from string to byte array/seq and back to strings, it should still be safe? This is mostly how this is being used across the codebase.
@zah, I'm also not aware of issues with async and openarray, do you remember what they are @cheatfate might also have an idea.
In general I'm all for cleaning this up, but I was under the impression that this sort of casting was mostly OK in Nim and there were some guarantees to keep seqs and strings casts compatible?
@sinkingsugar, @dryajov, please read through this rather long RFC where I've explained the problems in detail: https://github.com/status-im/nim-chronos/issues/2
This is addressed by the latest style guide outlined in https://github.com/status-im/nimbus-eth2/pull/2249.
Since we are talking about "sanitizing" and "security" for instance https://github.com/status-im/nimbus/issues/164 I'm collecting here a list of risky code. Not doing any change yet, but keeping it under the spotlight and up for discussion.
Wild casting
This is an example of bad practice, while
string
,seq[byte]
,seq[uint8]
andTaintedstring
are all equivalent at a low-level, this is something that might change, will likely change out of our control. In this specific case bothseq
andstring
are also basically references (not exactly but similar behavior for the compiler). Moreover is lossy as there is no UTF8 sanity check either. https://github.com/status-im/nim-libp2p/blob/88a030d8fbd76023354f14ff10ba740786eb46a4/libp2p/muxers/mplex/mplex.nim#L88Wild gcsafe
https://github.com/status-im/nim-libp2p/issues/68 I see way too many
ref
and way too many unsafegcsafe
new
patternI'm 50/50 on this.. I actually like the idea of
newX
expressing heap/ref allocations, but I often hear we want to change it intoT.init
(cc @arnetheduck ) I made a list of thosenewX
in our code base (some are system.nim)