Discussion on allocation of Tachyon

tianyin commented 9 years ago

I wanna invoke a discussion on how to measure the allocation overhead of Tachyon.

Assumptions

Foremost, according to my understanding, only the allocation overhead of workers (who stores real data) matters; the master only maintains metadata so it's allocation overhead is not interesting. Correct me if I'm wrong @stormspirit, @jakemask

Attempts

I managed to apply the java-allocation-instrumenter (https://github.com/google/allocation-instrumenter) on Tachyon workers, and passed all the tests.

Difficulties

However, I encountered the difficulties to analyze the recorded overhead. First, the instrumenter does not tell the time overhead but only records the size of each allocated objects.

Second, the method used by instrumenter is to insert instructions after each new statement, which does not tell the real allocation overhead. This is a general problem of instrumentation based on new or malloc

Long story short, there are two big problems (I don't know how to solve),

1. The recorded log does not tell the real size of the object

Consider the following code snippets,

    new String("foo");
    new String("fooo");
    new String("foooo");
    new String("fooooo");
    new String("foooooo");
    new String("fooooooo");

And the logs show,

Allocated object foo of type java/lang/String with size 24
Allocated object fooo of type java/lang/String with size 24
Allocated object foooo of type java/lang/String with size 24
Allocated object fooooo of type java/lang/String with size 24
Allocated object foooooo of type java/lang/String with size 24
Allocated object fooooooo of type java/lang/String with size 24
Allocated object java.lang.Object@76724124 of type java/lang/Object with size 16
Allocated object [C@78b5a19a of type char with size 48 (an array of size 13)
Allocated object com.google.monitoring.runtime.instrumentation.asm.ClassReader@3988aba1 of type com/google/monitoring/runtime/instrumentation/asm/ClassReader with size 32
Allocated object [C@5d97b6d5 of type char with size 80 (an array of size 32)
Allocated object [C@8fb758c of type char with size 32 (an array of size 6)
Allocated object [C@250e364d of type char with size 80 (an array of size 30)
Allocated object [C@59a754ee of type char with size 32 (an array of size 6)
Allocated object [C@39667011 of type char with size 72 (an array of size 27)
Allocated object [C@6a6235a4 of type char with size 64 (an array of size 21)
Allocated object [C@3e0adafb of type char with size 24 (an array of size 3)
Allocated object [C@5de9e9f0 of type char with size 96 (an array of size 37)
Allocated object [C@5705fca4 of type char with size 136 (an array of size 59)
Allocated object [C@2cb12cf5 of type char with size 72 (an array of size 25)
Allocated object [C@4484f69a of type char with size 24 (an array of size 4)
Allocated object [C@56302193 of type char with size 24 (an array of size 1)
Allocated object [C@2ea8072d of type char with size 40 (an array of size 11)
Allocated object [C@bdb4089 of type char with size 64 (an array of size 21)
Allocated object [C@6808574e of type char with size 32 (an array of size 6)
Allocated object [C@3f63cf65 of type char with size 96 (an array of size 38)
Allocated object [C@511c26cb of type char with size 56 (an array of size 17)
......

From the first 4 lines, we can see the String object actually have the same size. What differs is the field objects inside the strings. However, as the instrumenter does not provide any more informative interface, this is pretty much the only thing we can get. I don't know how to leverage such information, except do a type-based analysis.

2. The instrumenter only looks at the `new` statement.

In fact, if Tachyon or any other system performs some pre-allocation (e.g., allocating a pool of memory dur ing initialization), and then return the memory blocks when the client requests files/blocks. The instrumenter would tell us nothing because there is no new statements invoked at that time.

Thoughts/Solutions

Luckly, in Tachyon, memory allocation related code is well maintained in the directory core/src/main/java/tachyon/worker/allocation

They implemented three allocation strategy: RR, MaxFree, and Random, all of them inherits the abstract class AllocateStrategyBase.

Basically, the only method requires to implement is getStorageDir() which

 33   /**
 34    * Allocate space on StorageDirs. It returns the affordable StorageDir according to
 35    * AllocateStrategy, null if no enough space available on StorageDirs.
 36    *
 37    * @param storageDirs candidates of StorageDirs that space will be allocated in
 38    * @param userId id of user
 39    * @param requestSizeBytes size to request in bytes
 40    * @return StorageDir assigned, null if no StorageDir is assigned
 41    */

Though the algorithm of selecting the affordable StorageDir are different, the key operation is `requestSpace() which get the space from the affordable storage.

So my idea is to add time counters in getStorageDir() so that we can know the allocation overhead.

tianyin commented 9 years ago

btw, the entry point of the allocation process is in WorkerStorage.java

826   /**
827    * Request space from the worker, and expecting worker return the appropriate StorageDir which
828    * has enough space for the requested space size
829    * 
830    * @param dirCandidate The StorageDir in which the space will be allocated.
831    * @param userId The id of the user who send the request
832    * @param requestBytes The requested space size, in bytes
833    * @return StorageDir assigned, null if failed
834    */
835   private StorageDir requestSpace(StorageDir dirCandidate, long userId, long requestBytes) {
......

This includes the overhead of eviction (if there is no enough space, Tachyon will try to evict the dirs to the low-tier storage if there is any)

ryanphuang commented 9 years ago

So the 24 bytes is constant for all java strings? are the char arrays overhead after the string overhead logs also related to the strings? If so, that will be okay. For pre-allocation, it's difficult to trace unless we instrument the code itself. But the pre-allocation represents a strategy of mem management. If it's effective, that should reduce the overhead of new objects. So I think just measuring the new objects is fine, since it roughly reflects how effective is the pre-allocation (assuming we know the mem pool)?

tianyin commented 9 years ago

yes, 24 is constant for all strings
yes, the code (that generates) the log only contains these new statements
That's what I mean. As Tachyon has a very clean interface for space allocation, we can precisely and simply know how much time it takes to request a piece of memory space.

ryanphuang / imembench