bbarker commented 4 years ago

(WIP) Prim: A safe subset of Nim for pure functional programming

Note: I'm a very new Nim user so this is a major WIP RFC. I'll remove WIP from the title once I think I've filled my knowledge gaps, even if not all of the questions raised by the RFC are finalized. In the meantime, feel free to comment, especially if I've missed something that should be included or have said something that is untrue or just won't work.

Abstract

Less is often more, when it comes to safety. Nim already supports most features needed for pure-functional programming (e.g. effect tracking), but some features still allow for effects, by some definition: primarily assignment and other mutative structures.

Compiling with --prim would include multiple individual features that could be enabled individually as well.

Motivation

Programs written in a pure-functional style should elicit fewer bugs and surprises upon refactoring. Additionally, since Prim would be a a subset of Nim, all Nim users could benefit by programs and libraries checked with Prim options.

Prim would allow an influx of programmers to Nim from languages such as Haskell, that want pure-functional programming, but appreciate other features Nim has (lightweight compilation, scripting abilities, and others). Additionally, it would allow for programmers to gain experience with pure functional programming without the jarring syntax and theoretical principles of Haskell (note: I like Haskell syntax, but it is very different).

This PR would make many of the features already present in Nim more actionable (removing mutative effects) and allow higher degrees of safety.

Additionally, much like Nim itself was meant to fill a gap, Prim could fill an interesting gap in functional programming: almost every pure-functional language is in the Haskell style. A Prim (Nim) program would be pure-functional, but in a wholly different style that is familiar to most programmers, just with certain features removed. To some extent, the result of code written in this style would be an evolving experiment.

Description

Hopefully most of these can be implemented by detection in the lexer and throwing a meaningful error when the relevant option is enabled.

Possible features to include:

[ ] No assignment/object update. This would also remove var parameters I think, which should definitely be done, as var parameters could easily elicit very surprising behavior. This can largely be achieved by removing the ability to use var, in the case of assignment. Motivation: these are essentially local (lexical?) effects and can make it more difficult to analyze software.
[ ] (low priority, under consideration: see comment) Disable for/while/if/case. These control flow mechanisms are all statements and not expressions, thus they too are generally only useful with effects and assignments. Pattern matching libraries or an if-else function could likely be used instead. Instead, higher-order-functions (funcs) like map, filter, take, and drop can be used, as well as tail-recursive functions.
[ ] Disallow proc assignment to vars: can generally be difficult to understand as already outlined in the tutorial.
[ ] Disable break and continue - though they can be convenient, it can make control flow more difficult to analyze.
[ ] Similarly for early return: highly convenient but can result in surprises. But this one will require some alternatives, some of which already exist: tail-recursive functions, for instance. Also, higher-order functions like takeWhile, or something like Either.

Other minor features to consider:

[ ] blocks do not appear to be expressions in any sense, so I'm not sure if they have a use in pure FP.
[ ] Shadowing? It is an idiom in Nim so probably want to keep it, especially since it only appears to work for var parameters, not vars in general. So getting rid of var parameters would, I think, also get rid of shadowing as it exists in Nim.
[ ] Disable iterator? - it can only be used from for anyway, ans while if/case may be kept as they are also expressions, for and while seem less likely to be useful in this style. Also, an iterator is by definition not referentially transparent, though given the limited context of for, is probably not very surprising. However, if iterators could be used in something like a list comprehension (and maybe they can) keeping them would be more appealing.
[ ] Mutable strings? Need to look into this (and refs) more first.

Areas I still need to look into: nil (reference types), OOP, macros.

I'd like to think there aren't any downsides to the proposal, aside from possibly adding a few lines of code here and there to the compiler to prohibit certain syntactic features (though some cases may require more, I'm not sure yet.). Another possible downside is that code written without mutation might be less efficient in some cases, but hopefully these cases could be optimized away at some later point, but that shouldn't effect the RFC as such code can already be written.

Examples

TBD

Backward incompatibility

None is planned.

Clyybber commented 4 years ago

The problem with such global restrictions/language dialects is that now library authors will be pressured into making their library compile with --prim, even if it means sacrificing performance or elegance. IMO Nim is about enabling you to write code; not forcing you to write it in a specific style. You could perhaps make a macro (or a term-rewriting macro) that asserts certain features are not used.

SolitudeSF commented 4 years ago

Disable for/while/if/case. These control flow mechanisms are all statements and not expressions

they are both

bbarker commented 4 years ago

Thanks for the thoughtful reply, @Clyybber:

The problem with such global restrictions/language dialects is that now library authors will be pressured into making their library compile with --prim, even if it means sacrificing performance or elegance.

I'm not sure. In the case of feeling pressure, it seems like if we look at the Python community, Python typings are (unfortunately) being largely ignored by many library authors, even though there is no or minimal impact on performance there as well. And in the case of performance, authors can always benchmark their code if it is code that is expected to be performance critical, and if --prim style is an issue, they could likely just compile that code without the --prim option, and write other code with --prim. I haven't though through how different boundaries will work yet, though (need to get more familiar with Nim first).

IMO Nim is about enabling you to write code; not forcing you to write it in a specific style.

I don't view this as forcing. It is just a different degree of linting, if you like. This actually gives us more options: I no longer like writing in Scala that much, for instance, since I can write in Haskell and have better type-safety guarantees. It isn't about forcing other people to write in a style: it's about checking myself, more than anything. Not everyone, and probably most people for now, wouldn't use this option, but that doesn't mean it doesn't add value.

You could perhaps make a macro (or a term-rewriting macro) that asserts certain features are not used.

Thanks for the suggestion!

Ultimately, if a macro works, that could be fine, though maybe not quite as nice as having an option built into the compiler - if for no other reason than it isn't as apparent to users as an option. Though it would surely show how impressive Nim's macro system is!

Initially I was thinking of just playing around with this in a fork, but wanted to open up this RFC to get early feedback and guage interest (and comments have already been useful!). I really don't think that forking would be in the best interest of anyone, though. Firstly, as I said, I think Nim is already close enough for what I have in mind for Prim to not need many changes. And secondly, on the one hand, people interested in writing in Prim (probably Haskell-types, for now), could write code that would benefit from code already written in Nim (likely), and certainly since Prim code would be valid Nim code, all Nim users could benefit from Prim code.

bbarker commented 4 years ago

@SolitudeSF Ah, yes, I just tried it, thanks for pointing that out.

  const res = if 1 == 2:
    1
  else:
    2

I suppose it could be left in, then, so I'll modify my initial bullet point regarding that soon. The only caveat is that they would still be just as problematic as they currently are already in procs. Though I'd like to ask experts here if there is (likely) an easy way to tell in the compiler if the if/case/while/for is generating an effect. Because then, they could still be used in procs as expressions.

To be honest, this seems like it is pretty low on my list at the moment given disabling var is an option (and since we have func!), so I'll take some time to mull it over.

Araq commented 4 years ago

It's not clear to me how attracting Haskell programmers would work out, Haskell is already a perfect Haskell. Also, pure functional programming is a radical idea that misses half of the picture: Shared, mutable state is hard to reason about, but if you have a single owner mutations are perfectly fine and superior for performance. I mean, you cannot even write a real Quicksort in pure FP. (No, that Haskell 4-liner is not Quicksort, study more CS theory if you don't believe me.)

bbarker commented 4 years ago

@Araq Thanks for the reply. And I'm well aware of what you refer to in Haskell with quicksort. Though when I was a freshman in college my TA did pull one us with that. Think it was two lines. :-)

I'm a Haskell programmer. I'd like to think I'm welcome in the community (as well as other Haskellers more experienced and skilled than myself!). I don't always want to use Haskell as it is a very heavy-duty language when it comes to compiling code. Runtime isn't usually too bad, I've never had performance issues so far despite the lazy evaluation model ... but anyway, starting to get a bit off topic I guess.

Let me take some more time to look at the mutation model you refer to. I'm not experienced with Rust either, but realize it does offer some safety in this regard that obviates some of what pure FP offers.

mratsim commented 4 years ago

Haskell was the first language I dabbled into (well beyond BASIC and bash).

If we want to target a population of functional programmer, Scala and OCaml are the best languages to look into in my opinion. The syntax is similar, and it allows mutable state, let, references, ...

Impure functional programming (Scala, Erlang, F#, OCaml, even R) already has significant benefits without going all the way into category theory.

Furthermore, in terms of impact effect tracking, mutability tracking and the Z3 SMT solver will significantly improve the quality of Nim code (whether structural like today or functional). I'd rather have efforts go into those than in creating new functional problems namely:

documentation problems for monads
memory problems due to rampant closures and intermediate allocation
tail recursion problems, we have tail recursion in release mode, provided by the C compiler but being pure FP probably means we need it at the Nim level.
operator precedence problems due to left-to-right, right-to-left differences in arrow associativity
compiler problem for lazy infinite data structures All are somewhat solvable i.e. the last one can be done via closure iterators but a significant effort when core devs are already scarce.

The way forward

I believe you can already start as a library:

https://github.com/vegansk/nimfp for core data structures
https://github.com/nigredo-tori/classy for functors and Haskell type-classes
```
import classy, future
```

typeclass Functor, F[_]:

proc map[A, B](fa: F[A], g: A -> B): F[B]

proc $>[A, B](fa: F[A], b: B): F[B] = fa.map((a: A) => g)

instance Functor, seq[_] assert: (@[1, 2, 3] $> "a") == @["a", "a", "a"]


And you can create a `prim` macro via `{.push prim.} {.pop.}` that would recursively inspect for mutable assignment and the features you want disabled and create a compile-time error.
You can use the [`freshNodeIdent`](https://github.com/nim-lang/Nim/blob/40ac19572a86b5bfa7b57cf0482ae3a30432176a/lib/pure/sugar.nim#L174-L187) as inspiration. It inspect and unbind identifiers/symbols. It's easy to inspect for var assignments.
```Nim
proc freshIdentNodes(ast: NimNode): NimNode =
  # Replace NimIdent and NimSym by a fresh ident node
  # see also https://github.com/nim-lang/Nim/pull/8531#issuecomment-410436458
  proc inspect(node: NimNode): NimNode =
    case node.kind:
    of nnkIdent, nnkSym:
      result = ident($node)
    of nnkEmpty, nnkLiterals:
      result = node
    else:
      result = node.kind.newTree()
      for child in node:
        result.add inspect(child)
  result = inspect(ast)

It may even be possible to use term-rewriting templates to transform all var assignments into error in a scope.

Pseudo code

template primScope*(body: untyped): untyped =
  template preventVarAssignment{var a = b}(a, b: untyped): untyped =
    # Pseudo code, this probably doesn't compile
    {.error: "var assignment is not allowed in a prim scope.}
  body

And then used:

primScope:
  let a = 10 # no problem
  var b = a # triggers the error

Araq commented 4 years ago

I'd like to think I'm welcome in the community (as well as other Haskellers more experienced and skilled than myself!).

Sure, you are welcome.

zetashift commented 4 years ago

I like FPisms too, but I think enforcing this on the level you want is against something Nim stands for; like Clybber said: "IMO Nim is about enabling you to write code; not forcing you to write it in a specific style."

For people wanting a more FP style in Nim, I'd don't see why using something like nimfp and a pattern matching lib like https://github.com/zer0-star/matsuri and sprinkling your code with let's shouldn't get you there.

bbarker commented 4 years ago

@zetashift it is the same idea with linting and type checking - you get a tool to help you enforce the style you want and flag style that could lead, potentially, to errors. This is just more of what is already present, but at an optional level. Maybe a macro is the right way to do that, or maybe some of the suggestions here could be handled in the compiler as options, and others as part of a macro.

I'm still going through the language, and appreciate the feedback so far.

bbarker commented 4 years ago

(No, that Haskell 4-liner is not Quicksort, study more CS theory if you don't believe me.)

@Araq regarding Haskell quicksort, you might be interested to know that the practical way of doing this in Haskell tends to be something like what the Vector library does with introsort (a variant of quicksort). It actually uses mutable vectors.

The vector library has freeze and thaw functions that allow for conversion between mutable and immutable vectors (the unsafe variants do not copy the vector first). The production code here is not terribly long, though not as short as the quick-sort involving mem copies.

So when writing functions that require mutation in Haskell, one has to cheat a little to be speedy. If the desired output is an immutable vector, it would go something like:

thaw the vector (thus making a mutable copy of the original immutable vector)
perform mutable algorithm like sort on thawed copy
unsafeFreeze the thawed copy (thus resulting in no copies in this step)

So no additional copies are performed beyond the one required by the hypothetical API to maintain immutability.

I just finished reading tutorial 1 for Nim, and it is quite interesting. In Nim, the mutability of arrays is determined by their declaration style (let/const vs var) and not by types as in Haskell, but I think a similar API could be achieved by (1) starting with a let-declared array, (2) calling a function that takes in an array and performs some actions on a mutable copy of the array, (3) assigning that back to some constant: let outArray = modifyArray(arrayFunc, inArray).

In this hypothetical context of Prim, var is not allowed, so step 2 (i.e. implementation of modifyArray) would need to be done outside the context of Prim, which is fine I suppose. Let me know if I have anything wrong here. I think doing (2) is possible but I need to learn more about working with arrays generically.

Araq commented 4 years ago

So when writing functions that require mutation in Haskell, one has to cheat a little to be speedy.

You need to argue from first principles, why should we strive to copy Haskell's annoying designs? Where is the proof that even single owner mutability causes harm? How do we know that Haskell's designs are not worse? The road to hell is paved with good intentions.

bbarker commented 4 years ago

@Araq I'm not aware of any large studies done on the topic, so I can't do that. I can only say it is appealing to me in general, and ultimately it may truly be the case that the more important thing for a PL is how it meshes with the particular programmers involved.

I think it is intuitive to me that mutation can cause program analysis to be difficult (not all mutation certainly). One particular thing that stands out to me in this regard, in Nim, is the presence of var parameters. They could cause surprising behavior quite easily if one doesn't take care to chase down the function definition, and I'm curious about their use case (as compared to say, returning a pointer or ref). This goes back to the Vector library in Haskell: some mutability is OK if used wisely and in isolation, but the more global it becomes, the more potentially unwieldy. Nim's use of let seems very powerful in this regard, as far as I understand it.

As I feel at this point it is impossible to convince people that "my way is better than your way", and I still think Nim is a great language and has a number of advantages over Haskell (Haskell's support for records is awful for instance), I'd rather focus, for now, on some more tangible questions if that is OK, such as the feasibility of using something like modifyArray as mentioned above.

I think I've probably said something before in this thread already: I'm not trying to convince most Nim programmers to program a certain way. I'm trying to gather the tools I need to program Nim in a way that is comfortable to me, and in so doing, I hope to produce something useful for others as well, and maybe I'll pick up some liking for mutability on the way - who knows!

Araq commented 4 years ago

I'd rather focus, for now, on some more tangible questions if that is OK, such as the feasibility of using something like modifyArray as mentioned above.

Sounds feasible. Give it a try. distinct types could be of help.

akavel commented 4 years ago

@bbarker I would heartily recommend trying to implement your ideas as a macro. Macros are a super powerful concept & feature in Nim, and I believe going this way could have tons of benefits both for the project and the community:

composability and possibility of gradual "opt-in" by users - people could start by cautiously dipping a toe in your idea and testing it in a "small way", then if it proves useful to them, the idea could win them over gradually, letting them migrate code to the new model on their own terms and being in control of the process; giving users agency & control (though with a healthy dose of good, i.e. clear and informative, marketing) is a very powerful notion in convincing them;
using the {.push.}/{.pop.} feature would automagically give something akin to Rust's "unsafe" feature, a.k.a. an "escape hatch" that is very important & useful as a consequence of the "Joel's law of leaky abstractions";
the Nim community (relatively) much better understands how macros can be written than how the compiler itself is written, so you could get help & support much easier; also the macros API is relatively stable, while the compiler internals may not be so at all;
I believe with macros, it will also be much easier for you to track your changes when "starting small" and iterating than when modifying the compiler; also I expect it should be (relatively) much easier for you to debug any issues & harder to shoot yourself in the foot when writing macros than when modifying the compiler.

The one potential disadvantage that comes to my mind is that doing some things straight in the Nim compiler might potentially be much easier. However, I believe going the macro way (vs. a fork) gives this project much bigger chance of adoption & success. Also I'd think you can sometimes do quick PoC/experiments for yourself by modifying the compiler if that's easier for you, but after validating them, migrating them to the macro(s). Writing a macro is in wider perspective de facto writing a compiler, with the most tedious parts being already taken care of. Also with the benefit that you can do it in an incremental/patch-like way, i.e. starting from a completely unmodified AST, then tweaking it in any way you like.

edit: I especially recommend focusing on mratsim's comment above, it contains a lot of very valuable practical ideas that could help you get started in a very "nimsy" way.

bbarker commented 4 years ago

Thanks @mratsim , @akavel and others for comments indicating a way forward. I'm going to close this for now both because it appears too broad for an RFC given current interest, and because, as stated, it sounds like exploring macros will be a good way to get started. Though at some point, if we find something in the compiler that might be tuned or isn't possible with macros, I may open up an RFC for that.

I'll slowly be looking into this, but others who come here and are interested, feel free to watch https://github.com/Prim-Lang/Prim/issues or add your own ideas there. I may also chime in on this thread from time to time, even though closed.

nim-lang / RFCs