ozra / onyx-lang

The Onyx Programming Language
Other
97 stars 5 forks source link

Standard Type Namings #45

Open ozra opened 8 years ago

ozra commented 8 years ago

Standard Types - Namings

NOTE - this is only partially implemented currently, and so, it's mostly a discussion of how/what to actually implement.

Since Onyx uses Crystal stdlib, there are already de facto names for common types. However, I feel there's need for some clean up out of the perspective of Onyx.

I'd like to favour terse names for the common types.

Proposed Type Names in Onyx

Type Name Description
Nil The ubiquitous Nil-type, which plays an important role in Onyx
Any Abstract type, mother of all types except Nil
Num Abstract type, mother of all number types (corresponds to Number in Crystal)
Int Abstract type, mother of all Int'ish (corresponds to Int in Crystal)
Intd Int default. Defaults to ArchInt, unless specified otherwise for a specific program
Real Defaults to ArchReal, unless specified otherwise for a specific program
Nat Natural - non negative Int that is not binary unsigned (this would require some additional internal changes to types to be enforced)
List Dynamically growable list (like vector, array, sequence in some languages. Called Array in Crystal)
Map Map, currently HashMap implementation-wise
Tup Tuple
TTup TaggedTuple, aka NamedTuple
Set Set (duh!)
Tag Called Symbol in Crystal and Ruby
Str String
Bool Boolean
Ptr Pointer - a lethal "raw pointer"
ArchInt "Architecture Int" data type: pointer bit width for most platforms
ArchReal "Architecture Real" data type: simply Float64 on most platforms

Have I forgotten some obvious one?

"Machine Level" Data Types

Keeping these slick could be good, and also tell-tales their "machine-closeness" (do use "cleaner" types like Intd, Real etc. for most things! These are for type-defs/performance/c-lib interfacing code).

Type Name Description
F32 32 bits wide floating point
F64 64 bits wide floating point
I8 8 bits wide signed integer
I16 16 bits wide signed integer
I32 32 bits wide signed integer
I64 64 bits wide signed integer
U8 8 bits wide unsigned integer
U16 16 bits wide unsigned integer
U32 32 bits wide unsigned integer
U64 64 bits wide unsigned integer

Suggested Definition of Arch* types

As you can see, heavily x86*-centric atm, has to be extended when other architectures are added.

Note that this pseudo-code is to showcase the definition, in reality it will be specified only as "bit-width for Int and Real, respectively", and not as aliases.

ifdef x86_64
    type ArchInt    = I64
    type ArchUInt   = U64
    type ArchReal   = F64
else
    type ArchInt    = I32
    type ArchUInt   = U32
    type ArchReal   = F64
end

Thoughts?

stugol commented 8 years ago

non negative Int that is not binary unsigned

Firstly, what? And secondly, why is this needed?

Suggest IntBase or AnyInt instead of SomeInt. Some implies an Option = Some | None kind of concept.

Not sure about Arr. Why not just use the alternative syntax I propose in #31? After all, an array cannot exist without a concrete type.

What's a BufPtr?

ozra commented 8 years ago

1) If you want a variable that only can hold natural numbers (including 0...), then you need a Natural data type. Unsigned is a data type used for technical legacy reasons (primarily with FFI, or direct hardware communication). It's far better to double the amount of bits used to extend range, than to change the meaning of one bit to double the range. Int's and Nat's can be compared without any confusion. At machine code level the same optimizations as used for unsigned can be used, as regards to muls becoming shifts, etc. since it is promised to be non-negative. Since it also stays in range of corresponding Int-width, it can also be optimized when used in combination with Int. Win win, performance. "Natural is more natural than unsigned". (In all honesty: on the binary level they are "unsigned", they simply limit their range to never touch the sign bit)

2) Agree - both those propositions sounds better

3) Arr probably just should be Array, sounds a bit weird, and static arrays aren't that common. As to alternative syntax, I've had a couple (for value creation though!) on a todo-list, I'll jump in on discussion in #31. Alternatively the word Array shouldn't be used - to minimize confusion!

4) Loose idea on a Ptr associated with a Buffer. It would be helpful for range-checks during development, as extra testing insuranve, and compiled to zero-overhead in release mode. As a parentheses, it could be helpful for certain GC's also.

stugol commented 8 years ago

So a Nat takes more memory but is higher performance. But how many bits does it have?

ozra commented 8 years ago

The idea is simply that it follows the same pattern as Int: defaults to architecture pointer width (unless unreasonable for the specific arch), but can be chosen explicitly. And naturally (ha) also Nat32/Nat64 or N32/N64.

stugol commented 8 years ago

Why would I use a Nat when I can use an Int?

ozra commented 8 years ago

Where a number of natural type is needed, it's better to be able to type it in the program and get the help that types are for, instead of implementing range checks your self everywhere (and ifdef's to remove them for no-belts release speed). Indeces, sizes / dimensions, etc. As a side note, Nat? is also (thanks to the unused bit, and with a lot of code-gen hacking) also possible to store in the space of Nat, which would give a typed ability to differentiate between say index and not set with same space and performance. So, there are quite a few perks.

ozra commented 8 years ago

Proposition Update

As for the special treatment of allowing changing standard Int/Real with a "first thing" alias, I realise: it's a pragmatic solution, so of course that should be a pragma instead, and it should only be possible to choose the bit width, not a specific type (you can't, for instance, have BigInt as primary int type for practical reasons). Otherwise behave the same.

ed: It can only be set once for the entire program of course.

'std-real-width = 64
'std-int-width = 32
stugol commented 8 years ago

I don't really see the point. If I want a specific size of int, I'll just specify Int64 or something. Changing the size of a bare int, program-wide, is asking for trouble.

ozra commented 8 years ago

What's you reasoning?

stugol commented 8 years ago

Well, under what circumstances would anyone wish to use this feature?

And if they did, surely it would cause more harm than good? If I move a function from one file to another, suddenly it could be using a different type of int.

int should be arch-dependant. And if I specifically require a 64-bit int, I will specify it at the point of use.

ozra commented 8 years ago

I see your concern: no; you're only allowed a choice for the whole program regarding width of the int/real - it cannot change for different files.

For most of the cases the default is fine.

But if for some reason you're, say, making some specialist application where a lot of integer division occurs, than a substantial speedup could be gained from switching from 64 bit to 32 bit.

Likewise, if someone is writing an app that works with rather big integer numbers all through, it could as well use Int for cleaner appearance and simply declare that the int must be 64 bit for this program.

ozra commented 8 years ago

I edited the proposition, one does not want to change the arch-types that are set automatically...

stugol commented 8 years ago

Wouldn't it be far more readable to specify Int64 throughout if that's what the programmer intended? It just seems sloppy to redefine things like that.

stugol commented 8 years ago

What about built-in library functions? If I redefine Int to be Int32, and I call a library function that returns Int, what do I get? Aren't library functions baked into the compiler when you build it?

ozra commented 8 years ago

If someone makes the change because of performance reasons, for instance, it's not the intended requirement of the code, simply a measure taken to improve run time. If it was to be deployed to another architecture it might handle those instructions blazingly, and then it's better to use different int width.

This is not supposed to be "used everyday", just as little as "returns-twice". It's a power user feature. I just think it's good to not lock the user out of the option of choosing. I'm keeping this open for debate still though.

Library functions are included in your program as source-"modules" just as the rest of the program and participate through the entire compile process, inference and all, so they will happily comply.

ozra commented 8 years ago

I've updated the OP: SomeInt to the suggestion AnyInt which is better. Thereby also Any (mirroring Object in Crystal)

ozra commented 8 years ago

Edited OP, look there for current namings.