vacp2p / nim-libp2p

libp2p implementation in Nim
https://vacp2p.github.io/nim-libp2p/docs/
MIT License
251 stars 55 forks source link

Code sanity #73

Closed sinkingsugar closed 3 years ago

sinkingsugar commented 4 years ago

Since we are talking about "sanitizing" and "security" for instance https://github.com/status-im/nimbus/issues/164 I'm collecting here a list of risky code. Not doing any change yet, but keeping it under the spotlight and up for discussion.

Wild casting

This is an example of bad practice, while string, seq[byte], seq[uint8] and Taintedstring are all equivalent at a low-level, this is something that might change, will likely change out of our control. In this specific case both seq and string are also basically references (not exactly but similar behavior for the compiler). Moreover is lossy as there is no UTF8 sanity check either. https://github.com/status-im/nim-libp2p/blob/88a030d8fbd76023354f14ff10ba740786eb46a4/libp2p/muxers/mplex/mplex.nim#L88

Wild gcsafe

https://github.com/status-im/nim-libp2p/issues/68 I see way too many ref and way too many unsafe gcsafe

new pattern

I'm 50/50 on this.. I actually like the idea of newX expressing heap/ref allocations, but I often hear we want to change it into T.init (cc @arnetheduck ) I made a list of those newX in our code base (some are system.nim)

131: newSeq
184: newException
2: newSkContext
43: newConnection
17: newDaemonApi
1: newStringTable
3: newStringOfCap
47: newFuture
1: newPool
7: newAsyncEvent
7: newString
2: newMultistreamHandshakeException
22: newMultistream
2: newInvalidVarintException
1: newInvalidVarintSizeException
2: newChannel
16: newLPStreamEOFError
3: newStreamInternal
4: newLPStreamLimitError
21: newMplex
13: newStream
4: newMuxer
5: newMuxerProvider
9: newSeqOfCap
9: newIdentify
4: newMessage
5: newTimedCache
6: newMCache
3: newAsyncLock
15: newPubSub
63: newBufferStream
12: newPubSubPeer
9: newNoise
2: newPlainText
2: newSecioConn
4: newSecio
7: newStandardSwitch
48: newTransport
5: newSwitch
2: newAlreadyPipedError
4: newNotWritableError
8: newLPStreamIncompleteError
2: newChronosStream
2: newAsyncStreamReader
2: newAsyncStreamWriter
6: newLPStreamReadError
9: newLPStreamIncorrectError
4: newLPStreamWriteError
6: newNoPubSubException
3: newTestSelectStream
2: newTestLsStream
2: newTestNaStream
arnetheduck commented 4 years ago

for casting, see https://github.com/status-im/nim-stew/blob/50562b515a771cfc443557ee8e2dceee59207d52/stew/byteutils.nim#L127 - ie the idea is to transition them into function calls so that we can do something sane about it in the future - the conversion in the other direction might still be missing though - for now, like nim does, we assume a neutral world of byte sequences and no special meaning for utf8 etc.

a core issue is that string usage is prolific in nim even when byte sequence would be more appropriate - again, the idea is that we slowly clean that up by creating alternatives in stew so that we can test what a role model solution would look like in practise.

sinkingsugar commented 4 years ago

Yeah that's better (the toBytes) I guess, at least it's in a place that can be localized and triaged if something happens. Still it's far from safe, I think the GC references (on the backing memory) likely get increased/decreased/messed so far from noop. But the cast[string] for example doubt would work with arc or strv2 https://github.com/nim-lang/Nim/blob/7ec7731f824b933cdd4f4a1f816c1f3e4862bef6/lib/system/strs_v2.nim#L20 As length is before the payload (and payload cap is before etc..etc..)!

Completely true and agree on the string usage that randomly pops :), something like rust &[u8] a slice or basically a string view would be nice.

arnetheduck commented 4 years ago

afair the cast still has value semantics so there's a copy made, but feel free to correct me (and the comment on that function, maybe) - re arc, no idea, but getting rid of the casts is small step in a reasonable direction that works today.

sinkingsugar commented 4 years ago

Ah very true, you are completely right, there is likely a copy indeed! So basically no point on cast, just implementing a clean copy is basically the same and future proof!

zah commented 4 years ago

Most code accepting read-only inputs should prefer using openarray (a.k.a. slice/view/span). Strings offer toOpenArrayByte which is a safe and legal way to get the bytes of a string. The current issue is that you cannot use openarray parameters in async functions.

mratsim commented 4 years ago

Casting a string to seq[byte] is OK. Casting a seq[byte] to string is definitely not OK. The strings created by casting are not terminated by '\0' and will cause buffer overflow with C APIs.

This already happened in https://github.com/status-im/nim-http-utils/issues/8 and fixed in https://github.com/status-im/nim-http-utils/pull/9 by using the following procedure instead of cast:

proc toString(data: openarray[byte], start, stop: int): string =
  ## Slice a raw data blob into a string
  ## This is an inclusive slice
  ## The output string is nul-terminated for raw C-compat
  let len = stop - start + 1
  result = newString(len)
  copyMem(result[0].addr, data[start].unsafeAddr, len)

It has the same performance properties as casting for lvalues, for rvalues casting would avoid the copy but openarray would work as well.

If casting to string is desired, it should be a type NotNulString = distinct string to ensure it is never used as a public string or passed to cstring / C-API

sinkingsugar commented 4 years ago

Cool, was thinking to add something like that @mratsim , maybe can be added into stew or something like that! @zah Hmm, I didn't use async so much lately, is that a limitation in the current implementation for some reason?

dryajov commented 4 years ago

The strings created by casting are not terminated by '\0' and will cause buffer overflow with C APIs.

Not implicitly AFAIK, but if converting from string to byte array/seq and back to strings, it should still be safe? This is mostly how this is being used across the codebase.

@zah, I'm also not aware of issues with async and openarray, do you remember what they are @cheatfate might also have an idea.

In general I'm all for cleaning this up, but I was under the impression that this sort of casting was mostly OK in Nim and there were some guarantees to keep seqs and strings casts compatible?

zah commented 4 years ago

@sinkingsugar, @dryajov, please read through this rather long RFC where I've explained the problems in detail: https://github.com/status-im/nim-chronos/issues/2

dryajov commented 3 years ago

This is addressed by the latest style guide outlined in https://github.com/status-im/nimbus-eth2/pull/2249.