matthias314 / SmallCollections.jl

A Julia package providing variable-length set and vector types that don't allocate
https://matthias314.github.io/SmallCollections.jl/
MIT License
26 stars 0 forks source link

Splitting functionality into different Packages #4

Open favba opened 1 week ago

favba commented 1 week ago

Hi there!

The SmallVector provided by this package is something that I needed a while ago. I didn't find your package back then and ended up reproducing the functionality here. I remember I looked into every package in the JuliaArrays organization looking for this. I just found it through a comment of yours in a post in the Julia discourse forum.

Maybe I just didn't do enough research, but I thought it might be a good idea to split the functionality provided by this package into separate packages (e.g., SmallVectors.jl, SmallDicts.jl, etc.) and make this one a meta package that exports all of them. Then SmallVectors could live in the JuliaArray organization and maybe improve its discoverability.

matthias314 commented 1 week ago

ended up reproducing the functionality

Interesting! Any lessons learnt that you can share? For example, I've seen that you use UInt8 to store the vector length. While I'm currently using Int, I've been wondering if some other type (say, UInt8 or Int16) would be better (i.e., saving memory without affecting performance).

split the functionality provided by this package into separate packages. [...] Then SmallVectors could live in the JuliaArray organization and maybe improve its discoverability.

I'm almost done with adding several new types (FixedVector, SmallDict, SmallSet plus their mutable counterparts including MutableSmallVector, see the dev documentation). Once this is done, I want to announce the package on Discourse. I hope that will make it better known.

@mbauman I'm pinging you as a member of @JuliaArrays. What would be your recommendation about splitting this package and/or putting it under the @JuliaArrays umbrella? I don't have any experience with this kind of question.

favba commented 1 week ago

Interesting! Any lessons learnt that you can share? For example, I've seen that you use UInt8 to store the vector length. While I'm currently using Int, I've been wondering if some other type (say, UInt8 or Int16) would be better (i.e., saving memory without affecting performance).

Since UInt8 uses only 1 byte, it gives "Julia" more freedom when padding the ImmutableVector data, instead of being constrained by the 8 bytes from a Int field:

julia> using ImmutableVectors, SmallCollections

julia> im = ImmutableVector{7,Float16}(1,2,3,4)
4-element ImmutableVector{7, Float16}:
 1.0
 2.0
 3.0
 4.0

julia> sm = SmallVector{7,Float16}(im)
4-element SmallVector{7, Float16}:
 1.0
 2.0
 3.0
 4.0

julia> sizeof(im)
16

julia> sizeof(sm)
24

julia> imb = ImmutableVector{9,Bool}(1,0,0,0,1)
5-element ImmutableVector{9, Bool}:
 1
 0
 0
 0
 1

julia> smb = SmallVector{9,Bool}(imb)
5-element SmallVector{9, Bool}:
 1
 0
 0
 0
 1

julia> sizeof(imb)
10

julia> sizeof(smb)
24

julia> struct my_struct
           a::Float16
           l::Bool
       end

julia> ims = ImmutableVector{7,my_struct}(my_struct(1,1))
1-element ImmutableVector{7, my_struct}:
 my_struct(Float16(1.0), true)

julia> sms = SmallVector{7,my_struct}(ims)
1-element SmallVector{7, my_struct}:
 my_struct(Float16(1.0), true)

julia> sizeof(ims)
30

julia> sizeof(sms)
40

The difference in size only goes away when using elements of 8 bytes (in 64bit systems):

julia> im64 = ImmutableVector{6}(1.0,2.0,3.0)
3-element ImmutableVector{6, Float64}:
 1.0
 2.0
 3.0

julia> sm64 = SmallVector{6}(im64)
3-element SmallVector{6, Float64}:
 1.0
 2.0
 3.0

julia> sizeof(im64)
56

julia> sizeof(sm64)
56

However, an obvious disadvantage of using UInt8 is that you now have a hard definition of what "Small" means. It means < 256, which was more than necessary for my use case. But I think it is a reasonable constraint, If I'm not mistaken it is around that tuple length that things start working strangely for tuples (like it starts getting heap allocated, or functions not getting inlined, etc, but I'm not sure of that).

If I have time later I'll try including the ImmutableVector in the benchmark you have to see if there is any difference in performance. And maybe include some benchmark for my typical usage of such vectors (constructing them in hot loops)

matthias314 commented 1 week ago

In the meantime I added a type SmallLength for the length field of a SmallVector or MutableSmallVector, see 332135d in master. It's currently set to Int16 because on the computers I tried UInt8 was somewhat slower and Int8 might be too small. You can play around with it and see if you get similar results.

favba commented 6 days ago

Cool, I'll try the master branch out. It also occurred to me that another option is to make this a user choice, by parameterizing the n field. I believe maintaining the same API would still be possible by defining a type alias. Something like:

struct ImmutableVector{N,T,TL<:Integer} <: AbstractSmallVector{N,T}
    b::Values{N,T}
    n::TL
end

const SmallVector{N,T} = ImmutableVector{N, T, SmallLength}
export SmallVector
matthias314 commented 6 days ago

parameterizing the n field

That looks to complicated to me for such a minor aspect.