nim-lang / RFCs

A repository for your Nim proposals.
136 stars 23 forks source link

Renaming openarray #312

Open mratsim opened 3 years ago

mratsim commented 3 years ago

Openarray are a soon-to-be first-class type in Nim (#178)

The name is inherited from Pascal, Modula 2, Oberon.

It corresponds to the following concepts in other languages:

Given that view types are becoming first-class, I propose that we also gradually change the name openarray to something less foreign to developers who did not grow with a Wirth language (designer of Pascal, Modula, Oberon).

I think the new name should:

As we have the type names Slice and Range already taken, we can't use those. Which leaves span like C++, ArraySlice like swift or coming up with our own.

Name

I propose the name ArrayView which is more descriptive than span and more likely to be grasped by users coming from higher-level language than C++. It also reuses the views narrative in the doc and {.experimental:"views".} pragma or compiler flag. In terms of typing it's the same as today.

Procedures

Furthermore, I suggest we rename toOpenArray(container, start, stop) to toView(container, start, stop) and add an overload toView(container) that slices the whole arrays or seq length. The first will significantly improve their use in low-level libraries that routinely deal with either C code (ptr + length) or buffers from memory, streams, IO, protocols, cryptography, ... The second covers the very common use-case where we want to pass no-copy object to delegate processing, for example across a channel.

Stretch goal

Stretch goal (emphasize mine), backward compatibility is a significant concern.

Very often, especially during advent of code season, people, even experienced, are using slicing syntax with a[start ..< stop] instead of a.toOpenArray(start, stop-1), this leads to extra allocation and GC pressure. The toOpenArray syntax is very long to type and needs this easy to forget -1 in most cases.

Now that the compiler has an escape analysis, we should consider the slicing syntax to produce a view by default. We can have a[start..<stop].clone() to produce the old seq[T] if needed.

Note: we can also consider having slice return a view if the variable doesn't escape and a seq if it does escape but slicing would require a special signature for that. That said, the capability would be really interesting beyond slicing:

  • Futures and Flowvars are in many case consumed in the proc that allocated them, having the capability to decide the allocation (stack or heap) can significantly reduce overhead. (measured 2x on Weave overhead-bound benchmarks compared to an optimized memory pool, an order of magnitude compared to malloc or the GC)
  • Tasks, Generators and closure iterators. If an iterator or generator is created and consumed in its scope it can be stack allocated. This would help significantly the C compiler potentially leading to constant folding and "disappearing closure iterators" (https://godbolt.org/g/26viuZ)
  • That capability is called in C++ "Heap-allocation elision", though I think a dedicated RFC is necessary.

Trivia

toOpenArray is called slice in the compiler (with the compiler magic mSlice)

Clyybber commented 3 years ago

We will probably need two kinds of slices. One that can be copied and one that cannot be copied.

c-blake commented 3 years ago

When the length gets me down, I just do

template toOA(x, y, z): untyped = toOpenArray(x, y, z)

Like the template above one could probably type View[T] = openArray[T].

I will say that if you want to call it toView then I think the type should be called View not ArrayView just for internal consistency in Nim. (or toArrayView which is exactly as long as toOpenArray...)

I think (Array)View/to(Array)View could exist alongside openArray/toOpenArray for a long time, but I'm also very unsure that this is worth the confusion that might also arise from such name duplication for users. I can already see the "What is the diff between toArrayView and toOpenArray?" questions. There are several such Wirth-isms (e.g. ord/chr, though Python picked that one up, too). So, I think this is probably a tough judgement call. If Slice were untaken in Nim it might be easier.

Araq commented 3 years ago

So ... you want to rename it from a long name to an equally long name? And for what exactly, the users who cannot read a basic manual? There is no evidence that newcomers have trouble to pick up OpenArray as long as it happens to be part of the learning material they use.

ZoomRmc commented 3 years ago

I'm not sure about renaming but I fully agree slicing should produce a view by default.

In regards to a rename: If something needs to be picked, it should be short and clear. From the top of my head: Frame could work. It's clear that it's a limited view, it's short, it's not awkward to type/pronounce as in "framed view", "frame that seq" etc. On a semantic level I think it suits better than Slice with which we're mostly stuck for historical reasons - slicing an object means physically removing a part of it, which is not happening with a view, and by this time, most slices in other languages are reference types.

mratsim commented 3 years ago

If we want short we can also reuse span from C++ but both slice and view are more descriptive than openarray.

We can always defer to learning material but we can also help people by using easier names.

Regarding frame I've never used it in that context so I can't comment.

Araq commented 3 years ago

Given the current design problem with first class openArrays, we might indeed need to rename openArray, we need both an immutable and a mutable variant and the planned var openArray proved to be insufficient:


proc `[]`(t: Table; key: K): var V

var x: Table[openArray] # but now `[]` only offers a single level of mutability!

So we I propose View (immutable view) and MView as the M prefix for "Mutable" is already used for mitems.

HugoP707 commented 3 years ago

i dont like the idea of renaming it, however, i really want an overload for toOpenArray(container) that slices the whole arrays or seq length.

c-blake commented 3 years ago

I like short names (more than most) and am ok with View/toView and MView, but if folks care more about backward compat. then the existing openArray could pair with mOpenArray. There is enough Nim code out there using openArray (EDIT: at least 30% of nimble packages) that we may never want to deprecate that name. We may never have to, but "rename" to me suggests eventual deprecation.

ZoomRmc commented 3 years ago

Shouldn't we better reserve View and MView for concepts, as these seem like more general terms, applicable to generic containers?

c-blake commented 3 years ago

I may be wrong, but I think if we could get -d:v1 below to work the bike-shedding about what name to use could maybe be reduced (i.e. C++ folk could use Span and MSpan since they're used to that and others could use whatever):

# template toView(x,y,z): untyped = toOpenArray(x,y,z) # just works
let x = [0, 1]
when defined(v1): # Error: type expected {or id expected if (View)}
  template View = openArray
  proc foo[T](x: View[T]): int = discard
  echo foo(x)

when defined(v2): # C error: too many args to function
  type View[T] = openArray[T]
  proc foo[T](x: View[T]): int = discard
  echo foo(x)     # gen code calls foo_mangle(x_mangle, 2)

when defined(v3): # works, but may not inherit magic properties
  type View[T] = concept a
    a.len is Ordinal
    a[0] is T
    for x in a: x is T
  proc foo[T](x: View[T]): int = discard
  echo foo(x)

I think v2 may be a bug to be reported? and might lose magical properties of openArray (and some forthcoming mOpenArray or MopenArray or whatever).

v3 works, but seems unlikely to inherit any magical properties. Maybe someone more steeped in the order of events in the compiler knows that even v1 will lose magical properties?

These questions also relate to symbol aliasing questions/PRs, linked to just for reference.

juancarlospaco commented 3 years ago

I am neutral.

...but please consider that a ton of code uses openArray, so renaming/removing it will break a lot of code. :neutral_face:

Araq commented 3 years ago

openArray remains and needs to remain, we need the distinction between View and openArray, var openArray has an []= operator but var View does not! It'll be complex... :-(

alaviss commented 3 years ago

Will the notion of distinct openArray ever be introduced? It could be useful for slicing a string so that echo for example will print the string instead of an array.

mratsim commented 3 years ago

Will the notion of distinct openArray ever be introduced? It could be useful for slicing a string so that echo for example will print the string instead of an array.

I use this construct instead

proc foo[T: not char](oa: openarray[T]) =

alaviss commented 3 years ago

I'm not sure how that relates to the usage I was referring to...

mratsim commented 3 years ago

It avoids openarray[T] from matching against a string.

alaviss commented 3 years ago

I was referring to this usage:


type StringView = distinct openArray[char]

proc `[]`(s: string, range: Slice[int]): StringView

proc print[T](oa: openArray[T])
proc print(sv: StringView)

print("string"[0..2]) # prints "str"
print(['s', 't', 'r']) # prints "['s', 't', 'r']"