[RFC] guidelines for when to use typedesc vs generics

timotheecour commented 6 years ago

This issue keeps popping up in different places (forum, PR's, issues) [1]. Let's centralize discussion in one place specific to this question instead of scattering the discussion. Choosing between typedesc vs generics affects how we use the APIs so having guidelines is good.

here's my understanding of what guidelines should be. I'll keep this top-level post up to date w comments below

generics pros

supports IFTI (implicit function type instantiation), aka generic type deduction eg: with proc foo[T](a: T) calling foo(1) infers T=int

there is no such notion with typedesc, it wouldn't make sense (cf mentioned in https://github.com/nim-lang/Nim/issues/3502#issuecomment-379115313)

makes clear distinction for callers bw runtime args (or static[T] compile time values) and compile time types, eg:
```
parse[types,...](runtime_args,...)
#or
first_arg.parse[types,...](runtime_args,...)
```
can instantiate to create a function:
```
let foo_inst=foo10[int]
echo foo_inst(10)
echo foo_inst.type.name
```
that would not be possible with typedesc (short of resorting to lambda etc)
can potentially (with language support) be used with a shorthand for the common case of single template calls, as suggested here https://github.com/nim-lang/Nim/issues/3502#issuecomment-377843974 : a.foo!int (or whatever other symbol that would make it un-ambiguous)
consistency/familiarity: that's how most (all?) languages with metaprogramming support work as mentioned here: https://github.com/nim-lang/Nim/issues/7430#issuecomment-379196859

D: void foo(T:U)()

C++ : template<typename T> class fun and template<> class fun<int> and std::enable_if for more complex constraints

C#: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/generics/constraints-on-type-parameters: public class foo<T> where T : U

swift: https://medium.com/developermind/generics-in-swift-4-4f802cd6f53c func foo<T: U>()

generics cons

call syntax foo[T](x) ambiguous (generic or array subscript depending on foo) , cf https://github.com/nim-lang/Nim/issues/3502

NOTE: IIUC, the ambiguity mentioned is only ambiguous for the lexer (or syntax highlight), not for the compiler

NOTE: this can be fixed by adding a syntax, eg some!(T)(x) as in D (https://github.com/nim-lang/Nim/issues/3502#issuecomment-377843974) or foo[:T](x) (https://github.com/nim-lang/Nim/issues/3502#issuecomment-378179740); unclear whether this will ever be changed though (lots of code would need to upgrade) EDIT foo[:T](x) is now implemented

EDIT no support for partial generic deduction yet (but see https://github.com/nim-lang/Nim/issues/7529 [RFC] partial generic deduction), which makes currently input.to[T] impossible, whereas input.to(T) works
EDIT can't be used with default values, tracked here https://github.com/nim-lang/Nim/issues/7516

typedesc pros

has advantages of behaving like a regular function parameter, eg:

can be used with UFCS, eg: int.foo or with operator syntax foo int ; with generics this would not make sense

can be used with named parameter, eg:

proc foo8(T1:typedesc, T2:typedesc): auto = return "T1:" & T1.name & " T2:" & T2.name
echo foo8(int,string)
echo foo8(T2=int, T1=string)

typedesc cons

as mentioned in https://github.com/nim-lang/Nim/pull/7481#issuecomment-379037090

Some programmers could probably have trouble remembering whether if it was read(int, s) or read(s, int). readint adds the flavor of knowing which part is the compile time arguments and which part is the method arguments
can be hard to tell whether an argument is a type or a value, eg: foo(foo.type, bar) or worse
EDIT can't be used at compile time, unlike generics, see https://github.com/nim-lang/Nim/issues/8486 ; however, a workaround is available via an intermediate generic, see that issue

same for generics and typedesc

both a compile-time only thing (cf https://github.com/nim-lang/Nim/pull/7481#issuecomment-378931977)
both can be specialized / overloaded, eg proc proc foo[T:int|float]() or proc foo(T: typedesc[int|float]) (https://github.com/nim-lang/Nim/issues/3502#issuecomment-378608341)
both can be used with concepts, eg: proc foo[T:isSomeConcept]() or proc foo(T: typedesc[isSomeConcept]) * neither can be used with default values, but see https://github.com/nim-lang/Nim/issues/7516, [RFC] default type parameter for generics and typedesc parameters #7516 this is now fixed for typedesc

recommendations ASSUMING https://github.com/nim-lang/Nim/issues/7529 is going to be fixed in near future

fix ASAP https://github.com/nim-lang/Nim/issues/3502 (ambiguous syntax) eg with @Araq's foo[:T](x) syntax or D's foo!(T)(x); if possible add (nonambiguous) shorthand for common case of single template param (eg x.too!T or whatever's nonambiguous)
when IFTI is needed (ie when type deduction from params is needed), use generic * when type T is an output type (not an input type), use generic, eg: input.to[T]
EDIT: else, use typedesc

question: curious whether there are cases where typedesc is truly preferred? (besides syntactic niceties of allowing UFCS and named parameters)

EDIT /cc @dom96

proposal: CT_typedesc (pending https://github.com/nim-lang/Nim/issues/7529) /

~~when a function is compile time only (eg no runtime parameters AND can be computed at compile time), use typedesc. In all other cases, use generics. Eg:~~

## property-like proc
proc bar(typedec:T):auto=T.name
## void return proc (eg checks)
proc someStaticCheck(typedec:T) = T.name == "foo"
## compile time values are ok
proc getNth(typedec:T, n: static[int]): auto = T.name[n]

[1] examples of recent discussions on typedesc vs generics:

https://github.com/nim-lang/Nim/issues/3502#issuecomment-378608341 https://github.com/nim-lang/Nim/issues/7430#issuecomment-377412261 https://github.com/nim-lang/Nim/pull/7481#issuecomment-379093787

EDIT without such guidelines we end up with having both kinds of functions, eg https://github.com/nim-lang/Nim/pull/7512 Add none[T]() as alias to none(T)

GULPF commented 6 years ago

makes clear distinction for callers bw runtime args and compile time types, eg:

This isn't true in general for Nim, since static[T] parameters look like normal parameters but are compile time only.

I think the strongest argument for generics is that most people are more familiar with them because of C#/Java/etc. Another argument is that there is a clear connection between type and type creation, e.g Table[string, int] is created with initTable[string, int]().

However, for some procs generics doesn't look nice syntactically. Examples:

let a = new int
let b = new[int]()

let c = sizeof(int)
let d = sizeof[int]()

For these single argument procs, generics introduce a lot of syntactic noise.

andreaferretti commented 6 years ago

For me the main distinction is that a typedesc parameter is just another parameter that can participate in overloading resolution, and I use them for ad-hoc polymorphism. I use generic parameters for generic polymorphism.

An example taken from emmy is

proc zero(T: typedesc[int]) = 0
proc zero(T: typedesc[float]) = 0.0
proc zero(T: typedesc[int32]) = 0'i32

and so on.

I also use typedesc parameters for the cases where the type T appears in return position but not in argument position. In this case, the compiler would not be able to infer it anyway. An example taken from neo is

proc zeros(n: int, T: typedesc[float64]): Vector[float64] = ...
proc zeros(n: int, T: typedesc[float32]): Vector[float32] = ...

In this case the typedesc parameter is useful to disambiguate, so that I can call zeros(5, float32), but cannot be inferred from the compiler since there is no argument to infer it from. Again, I could not use a generic here because there is not generic zeros procedure that would work for all types T

mratsim commented 6 years ago

In my case, I use the generics syntax for types everywhere.

This gives

proc zeros[T: SomeReal](n: int): Vector[T] = ...

Called with zeros[float32](10) (note: fixed after Timothee's comment)

If the T is used only for return value type in a typedesc notation I would use the _ symbol.

proc zeros(n: int, _: typedesc[float64]): Vector[float64] = ...
proc zeros(n: int, _: typedesc[float32]): Vector[float32] = ...

timotheecour commented 6 years ago

@mratsim

proc zeros[T: SomeReal](n: int, _: typedesc[T]): Vector[T] = ...

why not simply proc zeros[T: SomeReal](n: int): Vector[T] = ... ?

timotheecour commented 6 years ago

@andreaferretti everything you mentioned in https://github.com/nim-lang/Nim/issues/7517#issuecomment-379183154 can be done with generics just as well (with simpler declarations even)

proc zero[T: int]() = 0
proc zero[T: float]() = 0.0
proc zero[T: int32]() = 0'i32

proc zeros[T:float64](n: int): Vector[float64] = ...
proc zeros[T:float32](n: int): Vector[float32] = ...

@GULPF

This isn't true in general for Nim, since static[T] parameters look like normal parameters but are compile time only.

updated OP with runtime args (or static[T] compile time values) Although IMO this could be supported, as in D, C++ etc: proc foo11[T, c: static[string]](): auto = # c is a compile time value of type string

siliconvoodoo commented 6 years ago

TLDR but I searched the word "deduced" and it doesn't appear in this thread. The biggest importance of generics is that they are automatically deduced at the call site. That serves the purpose of giving very readable duck typing, or tricks where type(A) would have to be used otherwise. with potential DRY violation because of the need to use A and type(A) sometimes.

Also today Nim doesn't appear to support partial auto deduction, if you have 2 generic parameters and you want to auto deduce one but explicit the other at the call site, it's not accepted. Allowing naming the parameters at call site would help too. (same case than normal proc named params)

Lastly, variadics. need. otherwise can't do a proper createThread function for example. Even today it has an overload for arity0 and arity1. how unstylish...

andreaferretti commented 6 years ago

@timotheecour Your mileage may vary, but I find

proc zero[T: int]() = 0
proc zero[T: float]() = 0.0
proc zero[T: int32]() = 0'i32

very odd. I did not even think to do this - I realize it can be done but it seems a little abuse to use type bounds with a concrete type. Then again, it is just a matter of preference

dom96 commented 6 years ago

NOTE: this can be fixed by adding a syntax, eg some!(T)(x) as in D (#3502 (comment)) or foo:T (#3502 (comment)); unclear whether this will ever be changed though (lots of code would need to upgrade)

No, both syntaxes would be supported. We wouldn't just force everyone to use [:T].

On a side note, you use a lot of acronyms in your comments which I do not understand. I know you're writing a lot, but please try to expand them, examples that I saw include "bw", "IIUC", "IFTI", "cf" (this one I can deduce from context, but after looking it up it seems that it means "compare" which doesn't make sense to me, Wikipedia suggests "see" should be used to point to a source of information: https://en.wikipedia.org/wiki/Cf.).

Of course, I must thank you for summarising the discussion. It's a great summary :)

can be used with named parameter, eg

The code below this doesn't compile.

Seems that the pros/cons lists speak for themselves. The only real advantage typedesc has can arguably be implemented for generics too, but to me these advantages have never been useful.

The fact is that mixing values and types in a single list of arguments is messy and will undoubtedly lead to some strange issues.

PMunch commented 6 years ago

I'm on mobile so this will be brief. In my opinion it is dependent on usage. I use generics when my type is inferable or the same proc should be callable with the different types (not different procs with the same name). Typedesc should be used when it is meant that the user has to specify the type in order to get the correct result. As an example the read procs discussed earlier and implemented in my PR. Those use typedesc as the user has to describe which type it is meant to read and return. A similar example would be the parsing procedures discussed under the to RFC, these also do different things based on the users input type.

An example of generics would be for different types, but which all shared say a length field. Then declaring a generic len proc to return the length independent of the input object would be the right choice.

So to sum up, generics for when the type can be inferred, typically from some input object, or the same procedure does the same for all inputs (barring possible conversions). Typedesc for when the user is meant to supply a type and/or the effect of the called procedure is not the same (again the read proc where the stream is advanced by a different amount).

mratsim commented 6 years ago

@timotheecour: copy-paste issue, fixed in original post

metagn commented 6 years ago

How my brain sees it: To me, call arguments change what the procedure does or returns, generics help with types.

In the case of new(T) this fits, since it allocates memory based on the type it's given.

In the case of sizeof(T), it works because the result is a property of T, similar cases are zero(T), low/high(T), T of U (predicate)

In the case of zeros[T](n), this works too in my opinion. Attention is shifted towards the result, which is a sequence, and n is the length of that sequence. The difference between the zero values of different T's is less important compared to the length of the sequence, so choosing between zeros(T, n) and zeros(n, T) is unneeded when you have a section for type arguments.

However, if zeros didn't take a length parameter and was an infinite iterator, zeros(T) would be preferrable. Iterators have more singular significance than sequences.

In the case of cast[T](value), as a low level operation, the value isn't changed, its type information is. to(a, T) is the opposite, it changes the value so it fits T. The argument order in to(a, T) might come off as odd, but I think that could be fixed by making it infix so it reads like English. as could work too in this manner.

In the case of none(T), it could be faster to write, but doesn't make sense, as T doesn't mean anything and just allows the return type to be Option[T].

In the case of read[T](stream) or read(stream, T), a parallel could be found to the difference between cast-ing and to-ing in the interpretations of the name "read". Does reading an int read and cast some bytes to an integer, or does it read a string and parse its value?

Stuff like this shouldn't be important. Maybe some people have problems with Nim because stuff like this isn't regarded as important, but all I know is I don't care for it and I shouldn't have spent the time I did writing this comment.

timotheecour commented 6 years ago

@siliconvoodoo

TLDR but I searched the word "deduced" and it doesn't appear in this thread. The biggest importance of generics is that they are automatically deduced at the call site. That serves the purpose of giving very readable duck typing, or tricks where type(A) would have to be used otherwise. with potential DRY violation because of the need to use A and type(A) sometimes.

I had mentioned it but using a different name: IFTI (implicit function type instantiation) which was coined in D. I just edited OP to mention deduction

Also today Nim doesn't appear to support partial auto deduction, if you have 2 generic parameters and you want to auto deduce one but explicit the other at the call site, it's not accepted.

Indeed, just filed https://github.com/nim-lang/Nim/issues/7529 ; thanks for raising this point; in fact, this is a blocker; if partial deduction isn't addressed, typedesc approach is to be preferred.

Allowing naming the parameters at call site would help too. (same case than normal proc named params)

feel free to open an RFC issue for that; (I can't think of a concrete use case for that but I could be wrong), so we don't conflate issues

Lastly, variadics. need. otherwise can't do a proper createThread function for example. Even today it has an overload for arity0 and arity1. how unstylish...

let's not conflate issues :-) this is already addressed here https://github.com/nim-lang/Nim/issues/1019

zah commented 6 years ago

I'll try to summarize my position here.

At the moment, the support in the compiler for explicit generic params is very much incomplete and requires quite a lot of work to get right. Since I believe that typedesc params are superior, I don't think we should invest development time in fixing the issues, instead we should promote the use of typedesc and we should further limit and deprecate the use of explicit generic params.

Here are the issues:

1. The generic params don't support overloading properly.

The simple reason is that they are not subjected to the same logic in sigmatch as the typedesc params and it would be quite complicated to change this. The examples for "overloading" that were provided will break down as soon as the notion of overload specificity is introduced:

proc zero[T: auto]() = 0 # handle any type
proc zero[T: int32]() = 0'i32 # provide a more specific overload for `int32`

The above will fail with the error Error: redefinition of 'zero'

For the same reason, it would be hard to introduce default values and keyword arguments for the generic params. The compiler is just not written with the idea that overloading should be supported and the signature matching happens in a much more rudimentary fashion.

2. There is no good support for partial application of the generic params

This problem will be somewhat easier to solve, but keep in mind that I've known about it for several years, I had various issues with the implicit generic params that would have been easier to solve if partial application was possible, but nonetheless I never found the time to do the required refactoring.

3. The typedesc params are just better.

When it comes to syntax, I think having a single uniform function call syntax that covers everything is just more elegant:

var f = new Foo
var t = stream.read Transaction
var m = Matrix.identity

Nim tries to feel like a scripting language and treating the types as regular values will be natural to people who have been exposed to Ruby or Python.

Please note that typedesc and static params are considered implicit generic parameters. The user should be able to specify concrete values for them when obtaining a pointer to a proc featuring them:

var identityProc = identity[Matrix]

On a side note, the newer already supported way to define typedesc procs looks like this:

proc zero(T: type int): T = 0
proc zero(T: type float): T = 0.0
proc zero(T: type int32): T = 0'i32

I plan to migrate all of the official docs to this style once it's supported for static[T] as well. Then var, ref, type and static will work consistently as modifiers in proc argument types and concept definitions.

Considering all of the above, I think the guideline should simply be:

1) Use generic types for generic value params, where the types will be inferred from the call-site. 2) Use typedesc for all params that must be explicitly specified by the user.

siliconvoodoo commented 6 years ago

Why not: get rid of generics in the semantic pass. Then handle them by a pure syntactic transform that rewrites generics as typedesc parameters. the same way than "this" is implicit in C++, or in Nim with the self pragma.

metagn commented 6 years ago

your post doesnt make sense to me, are you trying to present the issues generics have already or are you reasoning typedesc parameters being better with these issues? because your post was prefaced with "generics problems are not worth fixing because typedesc params are better", and to me it looks like you were trying to say typedesc params are better because they dont have those problems.

yes, youre right, typedesc params have all the cool things that come with regular argument overloading that generics don't currently have. that does not mean generics don't have an idea behind them, like that it's not a good idea to mix a code generation facility with runtime arguments. in fact, in contrary to your statement, i think it's not elegant at all a thing to do that. it's more likely to confuse people coming from scripting languages than welcome them. i think this looks terrifying for a statically typed language:

var t = stream.read Transaction

"read transaction from stream"? to people from static languages this will look like voodoo. procedure calls should be procedural, there should be other constructs like type annotations and generics that the programmer, no matter what background, can understand only matters at compile time. compile time is something nim has, and to pander to users of scripting languages by ignoring it exists is not worth it. also i think this reads better than all other options:

var t: Transaction
stream.read(t)

these 2 things are very different ways of feeling like a scripting language:

routes:
  get "/":
    resp "something"

var foo = create object, Foo, "id", 3, "name", "John"

zah commented 6 years ago

@hlaaftana, the typedesc params are still a compile-time mechanism. When it comes to code generation, they behave exactly as regular generic params (the proc gets instantiated for each unique type).

metagn commented 6 years ago

I know. That's why having them in the same argument list isn't a perfect idea.

zah commented 6 years ago

If we want to have consistent libraries, we must deprecate one of the mechanisms. Otherwise, people will always choose the mechanism they like more and this thread have demonstrated that aesthetic preferences vary.

Here, I will argue that it would be much more painful to deprecate the typedesc params. Without them, it's impossible to define "new syntax" around types. The nice and short expressions such as new Foo or Matrix.identity will have to become mouthfuls such as new[Foo]() and identity[Matrix](). This will be extra painful in type associations, because the dot expressions returning types will now look like proc calls.

On the other hand, deprecating the explicit generic params is easy. At the generic proc definition we can simply warn about any params that were not used within the signature:

proc foo[A, B](x: B) # deprecation warning: 'A' is declared, but not used

To make the solution complete, such a message should be displayed only when the proc is not already deprecated (this is what we'll do with the existing procs in the stdlib).

timotheecour commented 6 years ago

@zah

If we want to have consistent libraries, we must deprecate one of the mechanisms. Otherwise, people will always choose the mechanism they like more and this thread have demonstrated that aesthetic preferences vary.

+1 on this

Here, I will argue that it would be much more painful to deprecate the typedesc params. Without them, it's impossible to define "new syntax" around types. The nice and short expressions such as new Foo or Matrix.identity will have to become mouthfuls such as new[Foo]() and identity[Matrix](). This will be extra painful in type associations, because the dot expressions returning types will now look like proc calls.

I've said this a few times before but here goes again: in D, when there's a single template argument the analog short form syntax is: bar!Foo() and identity!Matrix() which is not a mouthful and has advantage of nicely separating compile time vs runtime.

can bar!T syntax be introduced to mean bar[T] ?
if that's impossible because of ambiguity, can another easy to type symbol be used for 1-element generic? For this important purpose, I think it'd be fine to reserve 1 operator for that use case of 1-element generic.

here are some options:

bar!T
bar?T
bar@T
bar~T
bar::T
bar%T
bar^T
bar<>T

metagn commented 6 years ago

I'd just like to say that I don't really like how bar!T looks

Also to be clear I have no problem with identity(Matrix) and new(Foo), it's just that getting rid of all explicit generics like newSeq[int] feels sudden and unnecessary to me

timotheecour commented 6 years ago

side question: just filed https://github.com/nim-lang/Nim/issues/7596 ; is that a bug that can be easily fixed? if not, that makes generics more useful in that regard

Araq commented 6 years ago

In my opinion it's bad to have an RFC for this at this point. The standard library is conservative for better or worse. Nimble packages need to take the lead and then report back what works best. I ensured obj.field[:T](a, b) is at least now possible so that [T] is on equal footing and the stdlib heavily leans towards [T]. (I personally am slightly in favour of typedesc.)

I am not a fan of this recent trend of nailing down every single style aspect. Nim is not Perl but it's not Python either, differences in style are acceptable and almost inevitable in every program that has multiple authors.

Araq commented 6 years ago

is that a bug that can be easily fixed?

I think so.

if not, that makes generics more useful in that regard

Bugs should not be taken as an argument unless the number of bugs justifies an argument like "these bugs indicate an overly complex design".

zah commented 6 years ago

the stdlib heavily leans towards [T].

@Araq, my point expressed in some of the linked discussions is that this bias in the standard library is leading people to using this style in their own APIs, which is likely to create problems for them down the road (when they hit the overloading limitations).

My other argument is that it's relatively easy to fix some of the current bugs, but implementing proper overloading/specialization for the generic params is a more significant refactoring that's not likely to happen anytime soon (and not necessary given that alternative solutions exists).

dom96 commented 6 years ago

which is likely to create problems for them down the road (when they hit the overloading limitations).

Is it though? I argue that we need more evidence of this. I never needed to overload generics in the past and I am willing to bet that most users won't need to do it either.

I prefer generics because there is a clear separation of types and values. I don't like the mixing that typedesc introduces, and because of that I think it should be used in only specific circumstances.

bluenote10 commented 6 years ago

I often had problems with typedescs because they cannot be used for type conversions, see https://github.com/nim-lang/Nim/issues/8403. Usually I would have preferred to use typedescs but I couldn't because of this limitation. I just noticed that this limitation isn't real, see the work-around in the linked issue.

github-actions[bot] commented 9 months ago

This RFC is stale because it has been open for 1095 days with no activity. Contribute a fix or comment on the issue, or it will be closed in 30 days.

nim-lang / RFCs