Closed Clonkk closed 3 years ago
Yes! Finally we're bringing more of Nim's effect system into regular use, I love it.
when memory safety isn't an issue.
That's a poor way of phrasing it, it always is an issue. The problem is whether tools like memUnsafe
help more than they hurt. And IMO they hurt more than they help. You simply don't find many bugs with this mechanism, the bugs we should focus on are:
nil
pointers.This would be useful for any context where memory safety is required (embedded target without an MPU comes to mind).
What does that mean, memory safety is always important, but what is safe it defined by the language, not wether your target has an MPU.
How is importjs
memory unsafe ?.
That's a poor way of phrasing it, it always is an issue.
Probably, but communication is hard. The point is that there are situations where an unsafe memory access can have critical consequences.
The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help.
I agree that adding more friction for a situational needs is not a good solution. Which is why memUnsafe
doesn't do anything unless specifically marked memSafe
. It shouldn't break existing code or add more work to someone wrapping a library.
You simply don't find many bugs with this mechanism, the bugs we should focus on are: Array indexing out of bounds. Integer overflows. Derefrencing nil pointers.
This is subjective. I use FFI & cast
regularly but never had a case of integer overflow. The list of what should be marked unsafe can be expanded if it's relevant.
These are not mutually exclusive and I don't see why limit the tools at our disposal (again, given it does not add more friction).
what is safe it defined by the language
So let's define it.
not wether your target has an MPU.
The consequences of an unsafe memory access depends (among other things) on your OS and target.
How is importjs memory unsafe ?.
I added importjs
and importobjc
mostly for uniformization with importc
and importcpp
. (I only use c & c++ backend). Depending on how pointers are handled in JS, it may not be relevant to define it as unsafe.
Yes, but DrNim can detect "array index out of bounds" and can do everything that Nim's effect system can do. I encourage contributions to DrNim... Maybe we can discuss these things on IRC/gitter/whatever suits you.
I haven't looked in details at DrNim but safety by formal verification does sound interesting. I usually use Matrix, is there a DrNim channel to join ?
Personally, I don’t feel like memUnsafe/safe can be very useful, I’d rather have other effects like “writes” (whose discussion has stalled a lot https://nim-lang.org/araq/writetracking.html)
I assume this is missing raw memory management like alloc, alloc0, allocShared, create, createShared, c_malloc, free, freeShared, dealloc, deallocShared.
Also newSeqUninitialized, everything that creates an uninitialized ref and, ptr and ptr UncheckedArray.
I agree with Araq though, this is might be cheap in terms of semantics but formal verification is what you want in embedded. See my RFC #222 first part.
Personally, I don’t feel like memUnsafe/safe can be very useful, I’d rather have other effects like “writes” (whose discussion has stalled a lot https://nim-lang.org/araq/writetracking.html)
Er, no, it's shipping via --experimental:strictFuncs
I assume this is missing raw memory management like alloc, alloc0, allocShared, create, createShared, c_malloc, free, freeShared, dealloc, deallocShared.
I thought about that, but I think it's more relevant to mark the act of accessing memory as unsafe rather than the allocation / deallocation. i.e. I don't think a dangling pointer that you never access is an issue.
Also newSeqUninitialized, everything that creates an uninitialized ref and, ptr and ptr UncheckedArray.
UncheckedArray
completly agree - I didn't explictly wrote it because I assumed accessing an unchecked array was the same as derefenrencing a pointer.
A case can be made for uninitialized ref, and seq with -d:danger (or when not compileOption("boundChecks")
whichever is more relevant).
this is might be cheap in terms of semantics but formal verification is what you want
This RFC extend the existing effect system with notation that already exists and is documented. It's purely a semantic check with familiar syntax. It's a simple solution, easy to understand that shouldn't be a burden in maintenance (at least I hope).
There is no doubt that formal verification is a better, more rigorous solution in the long-run. I haven't looked at DrNim in details yet, but it's likely more complex and more expensive (there's also the new syntax / documentation cost but that's manageable I think) than an effect based solution. It's hard to have a clear idea of the cost of expanding and stabilizing DrNim into a "production ready" tool (and maintenaning it) :).
Well at least DrNim is a separate tool, so the complexity is "elsewhere" and doesn't affect Nim with more experimental features. More importantly though: I claim the additions are not "worth it", so every time you lose productivity you need to measure it and every time you win productivity because it found a bug. Then in the end we would have an objective measure. It's hard to do but right now we don't even have anectodes about how it helps/hurts. Now, this argues for "let's add it and see how it turns out" but in the past we never removed any new effect from Nim, I tried to remove some but failed. There always are voices which want to keep every little thing that we added, esp if it "could find a bug one day".
Imagine if DrNim is just ship like Nimpretty, it would make memUnsafe
almost redundant,
with help from not nil
, strictFuncs
, etc.
This seems pretty harmless to me. We can always deprecate/transform them out of code later, and any discussion of DrNim by comparison seems disingenuous at best -- DrNim has far less support (good luck compiling it) and is practically invisible as far as users are concerned. Are the annotations needed even documented? Are there examples?
Gate this behind yet another flag if you must; I'm sure @timotheecour would be happy to provide a PR to do so.
This seems pretty harmless to me.
What about:
proc myUnsafeProc() {.memUnsafe.} =
discard
let indirect: proc () = myUnsafeProc # allowed?
proc mySafeProc() {.memSafe.} =
indirect() # loophole here?
Fair. So what you want for composition is
type
SafeProc = concept c
c is proc
c.hasEffect not(memUnsafe)
Now, this argues for "let's add it and see how it turns out" but in the past we never removed any new effect from Nim, I tried to remove some but failed.
Merging code that is already planned to be removed is not a good practice, I agree.
any discussion of DrNim by comparison seems disingenuous at best -- DrNim has far less support (good luck compiling it) and is practically invisible as far as users are concerned. Are the annotations needed even documented? Are there examples?
From what I understand, DrNim should/will be able to deal with checking properties better than the current effect system can, so it's not absurd to compare it to an effect-based solution.
DrNim being heavily in develoment shouldn't be a reason to not invest in it. If it has been decided to move towards formal verification and that DrNim is part of Nim's future, then efforts spent improving it are worth considering.
Of course, remains the unanswered question of the timing : how long to implement this feature vs how long to stabilize DrNim ? Then again, it doesn't seem this RFC has sparked the interest of many people, so there's no rush.
@Araq:
The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help. You simply don't find many bugs with this mechanism
[Citation Needed] ;-) In Nim I pretty often find myself resorting to cast
and unsafeAddr
, and I've run into bugs I caused through using them. I get the impression other people use them too, since I see them used in workarounds for common problems like "how do I pass a seq[byte]
to a parameter that takes a string
?"
Yes, but DrNim can detect "array index out of bounds" and can do everything that Nim's effect system can do.
I'm interested in "unsafe" annotations less as a way to detect bugs, and more as a way of auditing modules I import and code I write, to try to constrain unsafe behaviors to limited areas.
In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them.
And you would run into less bugs if only they were inside an unsafe
code section? Come on, I know you don't believe that.
I'm interested in "unsafe" annotations less as a way to detect bugs, and more as a way of auditing modules I import and code I write, to try to constrain unsafe behaviors to limited areas.
When you audit code you can search for addr|cast|unsafeAddr
much like you can look for unsafe
. So constrain unsafe code to the procs that need it. I really dislike the idea of making Nim more complex just because you feel the need for more punishment for using the already ugly and verbose unsafe constructs.
An alternative proposal would be a tag for not allowing direct calls into importc
'ed functions. That would be much more useful (IMHO) and would actually address the valid point you raised on the forum.
[Citation Needed] ;-) In Nim I pretty often find myself resorting to
cast
andunsafeAddr
, and I've run into bugs I caused through using them. I get the impression other people use them too, since I see them used in workarounds for common problems like "how do I pass aseq[byte]
to a parameter that takes astring
?"
You certainly shouldn't cast or you would have a string that doesn't end with \0
and it will break in a strange manner in a C library https://github.com/status-im/nim-http-utils/pull/9
The short-term solution is to provide a conversion proc in sugar and the long-term solution is for all libraries that work on blobs to standardize on openarray[byte]
.
adding {.memSafe.}, {.memUnsafe.} would actually add a lot of complexity.
Here are a few problems: under this RFC:
proc fn1() {.memSafe.} = discard
proc fn2() {.memUnsafe.} = discard
proc bar(fn: proc()) = fn()
bar
represents sorting and fn represents a cmp
), or if the annotation is added, it introduces a breaking change (previously valid code bar(fn2)
would stop working)fn1
.So this forces you to have some kind of genericity in {.memSafe.} vs {.memUnsafe.}, but that has its own issues, eg for fwd declarations:
proc bar[Fn](fn: Fn) # should you annotate as {.memSafe.} or {.memUnsafe.}?
proc fn1() {.memSafe.} = discard
proc fn2() {.memUnsafe.} = discard
proc bar[Fn](fn: Fn) = fn()
and heuristics such as "if any argument is {.memUnsafe.}, mark it as {.memUnsafe.}" just don't cut it (too pessimistic).
note that some of those problems affect other effects too, but the problem is exacerbated with memory safety. In all likelihood, the benefit you'd get would be limited (I doubt this would help catch bugs that couldn't be caught otherwise), especially in comparison with the costs involved.
precedent: see D's analogous annotations @safe, @trusted, @system (https://dlang.org/spec/memory-safe-d.html#:~:text=Memory%20Safety%20for%20a%20program,never%20result%20in%20memory%20corruption.) which make the language significantly more complex for IMO limited benefit.
That's function coloring. It is justified in many case because it helps read the codebase or ensure some degree of safety but too much coloring and you fall in http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/.
That said, I think coloring for async/await or {.gcsafe.}
or even {.raises: [Defect].}
is absolutely worth it especially from a maintenance point of view in large codebase (https://lukasa.co.uk/2016/07/The_Function_Colour_Myth/).
Having a type system is a form of coloring your functions already. We need a balance between convenience and overkill.
From https://arxiv.org/pdf/2007.00752.pdf
Across our dataset, we find that a majority of crates include functions which are possibly unsafe. We also find that unsafe function calls are the most common use of unsafeness, and the unsafeness is through library dependencies rather than through the use of the keyword unsafe. Perhaps nonintuitively, we find that the most downloaded crates have more unsafe code than other crates. From these results, it is difficult for users to know if their code is safe, and thus we present recommendations for helping users understand when they are using Unsafe Rust in their software.
Which supports my view, fwiw, not even in Rust it's easy to tell whether you used something "unsafe". Whatever that even means, of course eventually you use loads and stores into memory...
The goal of this RFC is to add a
memUnsafe
effect to Nim to be able to trace unsafe operations in procs.This needs to be an effect a not a tag because it has to be applied by keyword of the language (
cast
) and pragmas (importc
,importcpp
,emit
etc.).What is "unsafe" ?
Unsafe operations can be narrowed down in Nim to :
cast
addr
,unsafeAddr
importc
,importcpp
,importjs
,importobjc
,emit
[]
or.
syntax (ideally speaking, hidden dereferencing might be hard to track)Description
memUnsafe
should not have any impact by default. No friction should be added for wrapping library or writing code when memory safety isn't an issue.memUnsafe
therefore relies onmemSafe
to disables compilations of anymemUnsafe
effects.To remove the
memUnsafe
effect,{.cast(memSafe).}
can be used exactly likegcSafe
.Thus we can make the above example compile :
This would be useful for any context where memory safety is required (embedded target without an MPU comes to mind).