nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Adding memoryUnsafe and memorySafe effects #314

Closed Clonkk closed 3 years ago

Clonkk commented 3 years ago

The goal of this RFC is to add a memUnsafe effect to Nim to be able to trace unsafe operations in procs.

This needs to be an effect a not a tag because it has to be applied by keyword of the language (cast) and pragmas (importc, importcpp, emit etc.).

What is "unsafe" ?

Unsafe operations can be narrowed down in Nim to :

Description

memUnsafe should not have any impact by default. No friction should be added for wrapping library or writing code when memory safety isn't an issue.

memUnsafe therefore relies on memSafe to disables compilations of any memUnsafeeffects.

proc myUnsafeProc() {.memUnsafe.} =
  discard

# normalProc can use myUnsafeProc without issues or any additional notation. 
# Side effect propagate and normalProc is marked memUnsafe but it has no effect until memSafe notation is used
proc normalProc() =
  myUnsafeProc()

# Does not compile
proc mySafeProc() {.memSafe.} =
  normalProc()

To remove the memUnsafe effect, {.cast(memSafe).} can be used exactly like gcSafe.

Thus we can make the above example compile :

proc myUnsafeProc() {.memUnsafe.} =
  discard

# normalProc is not unsafe since myUnsafeProc is used in a {.cast(memSafe).} scope
proc normalProc() =
  {.cast(memSafe).}:
    myUnsafeProc()
  echo "the rest of the proc"

# No problem
proc mySafeProc() {.memSafe.} =
  normalProc()

This would be useful for any context where memory safety is required (embedded target without an MPU comes to mind).

dom96 commented 3 years ago

Yes! Finally we're bringing more of Nim's effect system into regular use, I love it.

Araq commented 3 years ago

when memory safety isn't an issue.

That's a poor way of phrasing it, it always is an issue. The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help. You simply don't find many bugs with this mechanism, the bugs we should focus on are:

This would be useful for any context where memory safety is required (embedded target without an MPU comes to mind).

What does that mean, memory safety is always important, but what is safe it defined by the language, not wether your target has an MPU.

juancarlospaco commented 3 years ago

How is importjs memory unsafe ?.

Clonkk commented 3 years ago

That's a poor way of phrasing it, it always is an issue.

Probably, but communication is hard. The point is that there are situations where an unsafe memory access can have critical consequences.

The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help.

I agree that adding more friction for a situational needs is not a good solution. Which is why memUnsafe doesn't do anything unless specifically marked memSafe. It shouldn't break existing code or add more work to someone wrapping a library.

You simply don't find many bugs with this mechanism, the bugs we should focus on are: Array indexing out of bounds. Integer overflows. Derefrencing nil pointers.

This is subjective. I use FFI & cast regularly but never had a case of integer overflow. The list of what should be marked unsafe can be expanded if it's relevant. These are not mutually exclusive and I don't see why limit the tools at our disposal (again, given it does not add more friction).

what is safe it defined by the language

So let's define it.

not wether your target has an MPU.

The consequences of an unsafe memory access depends (among other things) on your OS and target.

How is importjs memory unsafe ?.

I added importjs and importobjc mostly for uniformization with importc and importcpp. (I only use c & c++ backend). Depending on how pointers are handled in JS, it may not be relevant to define it as unsafe.

Araq commented 3 years ago

Yes, but DrNim can detect "array index out of bounds" and can do everything that Nim's effect system can do. I encourage contributions to DrNim... Maybe we can discuss these things on IRC/gitter/whatever suits you.

Clonkk commented 3 years ago

I haven't looked in details at DrNim but safety by formal verification does sound interesting. I usually use Matrix, is there a DrNim channel to join ?

HugoP707 commented 3 years ago

Personally, I don’t feel like memUnsafe/safe can be very useful, I’d rather have other effects like “writes” (whose discussion has stalled a lot https://nim-lang.org/araq/writetracking.html)

mratsim commented 3 years ago

I assume this is missing raw memory management like alloc, alloc0, allocShared, create, createShared, c_malloc, free, freeShared, dealloc, deallocShared.

Also newSeqUninitialized, everything that creates an uninitialized ref and, ptr and ptr UncheckedArray.

I agree with Araq though, this is might be cheap in terms of semantics but formal verification is what you want in embedded. See my RFC #222 first part.

Araq commented 3 years ago

Personally, I don’t feel like memUnsafe/safe can be very useful, I’d rather have other effects like “writes” (whose discussion has stalled a lot https://nim-lang.org/araq/writetracking.html)

Er, no, it's shipping via --experimental:strictFuncs

juancarlospaco commented 3 years ago

https://github.com/nim-lang/Nim/issues/15265#issue-692587008

Clonkk commented 3 years ago

I assume this is missing raw memory management like alloc, alloc0, allocShared, create, createShared, c_malloc, free, freeShared, dealloc, deallocShared.

I thought about that, but I think it's more relevant to mark the act of accessing memory as unsafe rather than the allocation / deallocation. i.e. I don't think a dangling pointer that you never access is an issue.

Also newSeqUninitialized, everything that creates an uninitialized ref and, ptr and ptr UncheckedArray.

UncheckedArray completly agree - I didn't explictly wrote it because I assumed accessing an unchecked array was the same as derefenrencing a pointer. A case can be made for uninitialized ref, and seq with -d:danger (or when not compileOption("boundChecks") whichever is more relevant).

this is might be cheap in terms of semantics but formal verification is what you want

This RFC extend the existing effect system with notation that already exists and is documented. It's purely a semantic check with familiar syntax. It's a simple solution, easy to understand that shouldn't be a burden in maintenance (at least I hope).

There is no doubt that formal verification is a better, more rigorous solution in the long-run. I haven't looked at DrNim in details yet, but it's likely more complex and more expensive (there's also the new syntax / documentation cost but that's manageable I think) than an effect based solution. It's hard to have a clear idea of the cost of expanding and stabilizing DrNim into a "production ready" tool (and maintenaning it) :).

Araq commented 3 years ago

Well at least DrNim is a separate tool, so the complexity is "elsewhere" and doesn't affect Nim with more experimental features. More importantly though: I claim the additions are not "worth it", so every time you lose productivity you need to measure it and every time you win productivity because it found a bug. Then in the end we would have an objective measure. It's hard to do but right now we don't even have anectodes about how it helps/hurts. Now, this argues for "let's add it and see how it turns out" but in the past we never removed any new effect from Nim, I tried to remove some but failed. There always are voices which want to keep every little thing that we added, esp if it "could find a bug one day".

juancarlospaco commented 3 years ago

Imagine if DrNim is just ship like Nimpretty, it would make memUnsafe almost redundant, with help from not nil, strictFuncs, etc.

disruptek commented 3 years ago

This seems pretty harmless to me. We can always deprecate/transform them out of code later, and any discussion of DrNim by comparison seems disingenuous at best -- DrNim has far less support (good luck compiling it) and is practically invisible as far as users are concerned. Are the annotations needed even documented? Are there examples?

Gate this behind yet another flag if you must; I'm sure @timotheecour would be happy to provide a PR to do so.

Araq commented 3 years ago

This seems pretty harmless to me.

What about:


proc myUnsafeProc() {.memUnsafe.} =
  discard

let indirect: proc () = myUnsafeProc # allowed?

proc mySafeProc() {.memSafe.} =
  indirect() # loophole here?
disruptek commented 3 years ago

Fair. So what you want for composition is

type
  SafeProc = concept c
    c is proc
    c.hasEffect not(memUnsafe)
Clonkk commented 3 years ago

Now, this argues for "let's add it and see how it turns out" but in the past we never removed any new effect from Nim, I tried to remove some but failed.

Merging code that is already planned to be removed is not a good practice, I agree.

any discussion of DrNim by comparison seems disingenuous at best -- DrNim has far less support (good luck compiling it) and is practically invisible as far as users are concerned. Are the annotations needed even documented? Are there examples?

From what I understand, DrNim should/will be able to deal with checking properties better than the current effect system can, so it's not absurd to compare it to an effect-based solution.

DrNim being heavily in develoment shouldn't be a reason to not invest in it. If it has been decided to move towards formal verification and that DrNim is part of Nim's future, then efforts spent improving it are worth considering.

Of course, remains the unanswered question of the timing : how long to implement this feature vs how long to stabilize DrNim ? Then again, it doesn't seem this RFC has sparked the interest of many people, so there's no rush.

snej commented 3 years ago

@Araq:

The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help. You simply don't find many bugs with this mechanism

[Citation Needed] ;-) In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them. I get the impression other people use them too, since I see them used in workarounds for common problems like "how do I pass a seq[byte] to a parameter that takes a string?"

Yes, but DrNim can detect "array index out of bounds" and can do everything that Nim's effect system can do.

I'm interested in "unsafe" annotations less as a way to detect bugs, and more as a way of auditing modules I import and code I write, to try to constrain unsafe behaviors to limited areas.

Araq commented 3 years ago

In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them.

And you would run into less bugs if only they were inside an unsafe code section? Come on, I know you don't believe that.

I'm interested in "unsafe" annotations less as a way to detect bugs, and more as a way of auditing modules I import and code I write, to try to constrain unsafe behaviors to limited areas.

When you audit code you can search for addr|cast|unsafeAddr much like you can look for unsafe. So constrain unsafe code to the procs that need it. I really dislike the idea of making Nim more complex just because you feel the need for more punishment for using the already ugly and verbose unsafe constructs.

An alternative proposal would be a tag for not allowing direct calls into importc'ed functions. That would be much more useful (IMHO) and would actually address the valid point you raised on the forum.

mratsim commented 3 years ago

[Citation Needed] ;-) In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them. I get the impression other people use them too, since I see them used in workarounds for common problems like "how do I pass a seq[byte] to a parameter that takes a string?"

You certainly shouldn't cast or you would have a string that doesn't end with \0 and it will break in a strange manner in a C library https://github.com/status-im/nim-http-utils/pull/9

The short-term solution is to provide a conversion proc in sugar and the long-term solution is for all libraries that work on blobs to standardize on openarray[byte].

timotheecour commented 3 years ago

adding {.memSafe.}, {.memUnsafe.} would actually add a lot of complexity.

Here are a few problems: under this RFC:

proc fn1() {.memSafe.} = discard
proc fn2() {.memUnsafe.} = discard
proc bar(fn: proc()) = fn()

So this forces you to have some kind of genericity in {.memSafe.} vs {.memUnsafe.}, but that has its own issues, eg for fwd declarations:

proc bar[Fn](fn: Fn) # should you annotate as {.memSafe.} or {.memUnsafe.}?
proc fn1() {.memSafe.} = discard
proc fn2() {.memUnsafe.} = discard
proc bar[Fn](fn: Fn) = fn()

and heuristics such as "if any argument is {.memUnsafe.}, mark it as {.memUnsafe.}" just don't cut it (too pessimistic).

note that some of those problems affect other effects too, but the problem is exacerbated with memory safety. In all likelihood, the benefit you'd get would be limited (I doubt this would help catch bugs that couldn't be caught otherwise), especially in comparison with the costs involved.

precedent: see D's analogous annotations @safe, @trusted, @system (https://dlang.org/spec/memory-safe-d.html#:~:text=Memory%20Safety%20for%20a%20program,never%20result%20in%20memory%20corruption.) which make the language significantly more complex for IMO limited benefit.

mratsim commented 3 years ago

That's function coloring. It is justified in many case because it helps read the codebase or ensure some degree of safety but too much coloring and you fall in http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/.

That said, I think coloring for async/await or {.gcsafe.} or even {.raises: [Defect].} is absolutely worth it especially from a maintenance point of view in large codebase (https://lukasa.co.uk/2016/07/The_Function_Colour_Myth/).

Having a type system is a form of coloring your functions already. We need a balance between convenience and overkill.

Araq commented 3 years ago

From https://arxiv.org/pdf/2007.00752.pdf

Across our dataset, we find that a majority of crates include functions which are possibly unsafe. We also find that unsafe function calls are the most common use of unsafeness, and the unsafeness is through library dependencies rather than through the use of the keyword unsafe. Perhaps nonintuitively, we find that the most downloaded crates have more unsafe code than other crates. From these results, it is difficult for users to know if their code is safe, and thus we present recommendations for helping users understand when they are using Unsafe Rust in their software.

Which supports my view, fwiw, not even in Rust it's easy to tell whether you used something "unsafe". Whatever that even means, of course eventually you use loads and stores into memory...