nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Sets: dont assume uint16 like (0 to 2^16 minus 1) as default type. #298

Closed arjunkathuria closed 1 year ago

arjunkathuria commented 3 years ago

Sets: dont assume uint16 as default type

Abstract

The sets advance type from nim by default assumes the type uint16 when you give it a range of numbers with no explicit type mentioned. eg:

var a = {0..10}

echo sizeof a  # outputs 8192 bytes : |

for which it ends up allocating 2^16 bits (size of the bit array sets use from the docs) or 8192 bytes even when asked for a set of say 0..10.

Motivation

I am new to the language and was making my way through the nim tutorial on the nim-lang website, when i encountered the set section where it says:-

The set type models the mathematical notion of a set. The set's basetype can only be an ordinal type of a certain size, namely:

    int8-int16
    uint8/byte-uint16
    char
    enum

or equivalent. For signed integers the set's base type is defined to be in the range 0 .. MaxSetElements-1 where MaxSetElements is currently always 2^16.

The reason is that sets are implemented as high performance bit vectors. Attempting to declare a set with a larger type will result in an error

This caught my eye because i really was reading it with attention.

soooo a set is always 2^16 bit vector eh ~.^

i tried to test it out how does that work. i fired up the inim interpreter and made a new set

var a = {0..10}

just so it happened, out of curosity, i checked what the size of this set was

sizeof a
# it said 8192 bytes

I tried it again to see if it were a mistake. not at all. it did infact allocate 2^16 bits memory even when i asked for a set of just {0..10}

This really isn't ideal.

People coming from other langugages assume it works the other way round. i.e. it just allocates what you ask it for and infers the integers types from what was asked, instead of assuming a unit16 type, which also disallows something like {-10..10}, by default, it being unsigned.

People expect to be able to make sets with negative numbers included in them, disallowing them should NOT be the default behaviour.

^This is the motivation.

Description

like i mentioned above. i encountered the problem of it allocating a lot more space than needed.

I went to the discord nim channel to see if it were the intended behaviour.

It was there i found that it was an issue with set assuming a default uint16 type for integers where the type wasn't explicitly mentioned.

It also brought about the issue of not being able to include negative numbers in sets by default:-

var a = {-10..10} 
# Throws an error

This really isn't something you need to think about. you should be able to add negative numbers to a set without it throwing an error by default.

How to implement.

i have no concrete idea about how to implement it. i'd have to leave that to the team.

That being said, if someone is up to the task of guiding me implement it, i'd be happy to put in the time and make this my first contribution to the nim compiler.

Else, i'd be happy to have brought this to notice : )

i do not see any downsides to this proposal ?

Examples

Here it goes.

Before

var a = {0..10}
echo sizeof(a) # echoes 8192

var b = {-10..10}
# throws an error

After

var a = {0..10}
echo sizeof(a) # not 8192 bytes atleast : /

var b = {-10..10} # lets me make this, no errors.

Result

Makes the set type more ergonomic to use.

Backward incompatibility

not. sure ? but this seems to be a superset of the current behaviour so it should probably be alright.

RSDuck commented 3 years ago

Sets of integers (int8, uint8, …) are in practice pretty rare. Most of the time they're either used with enums or with specific ranges:

type MyRange = range[0..10]

let x = {MyRange(0)..5}

I tried it again to see if it were a mistake. not at all. it did infact allocate 2^16 bits memory even when i asked for a set of just {0..10}

but the Nim compiler can't read your mind to determine what kind of numbers you want to put in later.

I think the problem seems to be more that it's not really possible to determine the type of a set literal {4, 5, 1} containing integer literals of unspecified size, so it has to be regulated by convention.

Araq commented 3 years ago

The language definition should stop to guess. The compiler should demand a construct like {MyRange(0)..5}.

disruptek commented 3 years ago

Agree; I ran into the same issue with the same project. But I don't see why the spec needs to be tighter; as long as we demand an Ordinal, I don't see why we cannot determine the required bits and scale the underlying type accordingly. I originally implemented this in a macro. I'm not saying it's not a breaking change, just that it doesn't seem to break the spec AFAICT.

disruptek commented 3 years ago

To clarify, set[range[500..600]] should occupy a single byte. :wink:

timotheecour commented 3 years ago

To clarify, set[range[500..600]] should occupy a single byte

maybe, maybe not, but that's in contradiction with https://github.com/nim-lang/Nim/issues/16270#issuecomment-740214558

disruptek commented 3 years ago

Well, whoever contradicted me is wrong. What can I say?

Is there an argument against efficiency that you think is worth reproducing here?