Open daniel-rusu opened 1 year ago
An alternative approach would be to have 2 measurement stacks, one for arrays and the other for regular objects so the logic would look something like this:
while (arrayStack.isNotEmpty() || objectStack.isNotEmpty) {
while (arrayStack.isNotEmpty()) {
val array = arrayStack.pop()
for (element in array) {
if (!shouldSkip(element)) {
measureObject(element)
}
}
}
while (objectStack.isNotEmpty()) {
measureObject(objectStack.pop())
}
}
To clarify, when encountering an array with N eligible elements, Jamm currently makes 2N copies. Inside the MeasurementStack, all the array element references get copied to the tracker, and another copy of all the references gets added to the stack.
This enhancement cuts the memory usage in half since the references only need to be lazily added to the tracker without needing to copy them to the stack.
I tried an implementation using a variation of the first proposal above so the rest of the code can remain mostly unchanged. The MeasurementStack contained 2 stacks, objectStack & arrayStack, and found that this improved performance by about 5% while cutting the memory consumption in half since both stacks remained tiny and only the tracker grew to keep track of all the elements.
Since the process of measuring an array element can end up adding additional arrays, we need to keep track of where we left off so the arrayStack stored an ArrayWrapper that contained a reference to the array along with the next index to be processed. When fetching the current array element, the index would be decremented until finding the next eligible element and the wrapper is removed from the stack when it contains no more elements. The isEmpty function returns true when both stacks are empty.
@daniel-rusu I am not sure that I understand your way of computing memory usage. Considering the way Java deal with memory when we add an array to the stack we only copy the reference to the array not its value. Therefore, the impact on memory is near neglectable.
@blerer there seems to be a misunderstanding here as this has nothing to do with the Java memory model. Yes, when first encountering an array field, a reference to that array is added to the stack and this is negligible as you say. However, when popping that reference and discovering that it's pointing to an array then every single eligible array element reference is copied to the stack. See this section of the code for details:
So this does actually cut the memory usage of the Jamm library in half when analyzing a large array of fairly shallow objects using the first proposal assuming we prioritize popping from the object stack. That's because each array element is popped directly from the array and analyzing that element would add a small number of references to the stack to be analyzed and emptied before popping the next array element. So instead of blowing up the size of the stack by copying all the array elements, the stack would instead remain tiny since only direct field references would be copied there and cleared before moving to the next array element.
Sorry, I did not read your 3rd comment before. I now understand your idea and it make sense. Fill free to propose a patch :-)
Instead of copying collections by pushing each array element individually onto the stack (in the MeasurementStack), we could create a
pushArray
method in theMeasurementStack
that keeps track of the entire array reference instead of copying all the elements from the array. The measurement stack could keep track of the index into the array so when the pop method is called and an array is present then fetch that element from the array at that index and decrement the index. When the index becomes negative then actually pop the array itself.This will make analyzing collections much more efficient in both time and space.