Open tianyin opened 9 years ago
btw, the entry point of the allocation process is in WorkerStorage.java
826 /**
827 * Request space from the worker, and expecting worker return the appropriate StorageDir which
828 * has enough space for the requested space size
829 *
830 * @param dirCandidate The StorageDir in which the space will be allocated.
831 * @param userId The id of the user who send the request
832 * @param requestBytes The requested space size, in bytes
833 * @return StorageDir assigned, null if failed
834 */
835 private StorageDir requestSpace(StorageDir dirCandidate, long userId, long requestBytes) {
......
This includes the overhead of eviction (if there is no enough space, Tachyon will try to evict the dirs to the low-tier storage if there is any)
So the 24 bytes is constant for all java strings? are the char arrays overhead after the string overhead logs also related to the strings? If so, that will be okay. For pre-allocation, it's difficult to trace unless we instrument the code itself. But the pre-allocation represents a strategy of mem management. If it's effective, that should reduce the overhead of new objects. So I think just measuring the new objects is fine, since it roughly reflects how effective is the pre-allocation (assuming we know the mem pool)?
new
statements
I wanna invoke a discussion on how to measure the allocation overhead of Tachyon.
Assumptions
Foremost, according to my understanding, only the allocation overhead of workers (who stores real data) matters; the master only maintains metadata so it's allocation overhead is not interesting. Correct me if I'm wrong @stormspirit, @jakemask
Attempts
I managed to apply the java-allocation-instrumenter (https://github.com/google/allocation-instrumenter) on Tachyon workers, and passed all the tests.
Difficulties
However, I encountered the difficulties to analyze the recorded overhead. First, the instrumenter does not tell the time overhead but only records the size of each allocated objects.
Second, the method used by instrumenter is to insert instructions after each
new
statement, which does not tell the real allocation overhead. This is a general problem of instrumentation based onnew
ormalloc
Long story short, there are two big problems (I don't know how to solve),
1. The recorded log does not tell the real size of the object
Consider the following code snippets,
And the logs show,
From the first 4 lines, we can see the
String
object actually have the same size. What differs is the field objects inside the strings. However, as the instrumenter does not provide any more informative interface, this is pretty much the only thing we can get. I don't know how to leverage such information, except do a type-based analysis.2. The instrumenter only looks at the
new
statement.In fact, if Tachyon or any other system performs some pre-allocation (e.g., allocating a pool of memory dur ing initialization), and then return the memory blocks when the client requests files/blocks. The instrumenter would tell us nothing because there is no
new
statements invoked at that time.Thoughts/Solutions
Luckly, in Tachyon, memory allocation related code is well maintained in the directory
core/src/main/java/tachyon/worker/allocation
They implemented three allocation strategy: RR, MaxFree, and Random, all of them inherits the abstract class
AllocateStrategyBase
.Basically, the only method requires to implement is
getStorageDir()
whichThough the algorithm of selecting the affordable StorageDir are different, the key operation is
`requestSpace()
which get the space from the affordable storage.So my idea is to add time counters in
getStorageDir()
so that we can know the allocation overhead.