jckarter / clay

The Clay programming language
http://claylabs.com/clay
Other
404 stars 34 forks source link

Improve Function perfomance #489

Open galchinsky opened 11 years ago

galchinsky commented 11 years ago

It is a feature-request or an RFC. Current C++ implementations of std::function usually hold small closures in the functor itself to avoid heap allocation. As I can see, Clay's Function always uses a heap and this could be changed.

stepancheg commented 11 years ago

Modern allocators (like tcmalloc) are very efficient in allocating small objects, and Clay should probably have core library easy to maintain rather than highly optimized.

jckarter commented 11 years ago

The representation of Function could be improved and retain a high-level, easy-to-maintain structure by being a variant Function (FunctionWithSmallClosure, FunctionWithHeapClosure).

stepancheg commented 11 years ago

There's simpler solution:

record SmallInPlace (
    data: Array[Byte, TypeSize(RawPointer)]],
);

record LargeInHeap (
   data: RawPointer,
);

// not generic
variant MemoryHolderSmallInPlaceLargeInHeap (SmallInPlace, LargeInHeap);

allocateMemorySmallInPlaceLargeInHeap(T): MemoryHolderSmallInPlaceLargeInHeap =
    if (TypeSize(T) <= TypeSize(SmallInPlace().data))
        SmallInPlace()
    else
        LargeInHeap(allocateRawMemory(TypeSize(T)));

and then

record Function[In, Out] (
    obj: MemoryHolderSmallInPlaceLargeInHeap,
    ...
);

Function has only only implementation.

Note that this implementation (as well as variant Function implementation) won't be always faster than current implementation, because malloc is cheap, but branch misprediction isn't.

jckarter commented 11 years ago

Indeed, factoring out the memory holder is a good idea to avoid needless instantiation. I would guess though that, even if you have a fast malloc, locality and heap size efficiency would end up being bigger factors than branch misprediction in a larger application. That's why libc++ favors size over speed and large C++ projects like LLVM and WebKit make heavy use of custom in-place SmallVector/SmallString/SmallDenseMap/etc. containers. In the case of Function, there's an indirect call to an underlying function pointer anyway, which will probably be opaque to the branch predictor no matter what.

galchinsky commented 11 years ago

Usually a construction is more rare than using. That's why cache friendliness of an object is often better than some creation overhead.