nodejs / node-addon-api

Module for using Node-API from C++
MIT License
2.16k stars 459 forks source link

Large arrays processing #317

Closed bsrdjan closed 6 years ago

bsrdjan commented 6 years ago

After receiving Napi::Array from node application, I have a for loop, passing each row to external c++ library, creating another array instance there for further processing.

In one point in time there are two copies of array data, NAPI::Array and array in external library. In case of large arrays, memory bottlenecks may happen, like discussed on stackoverflow: SAP RFC: Chunking of big data.

What would be an efficient method to process Napi::Array and free each row, after passing it to external library?

Here how it works in python and I am looking for something like splice() or python del[:1] in N-API, to do the same:

while len(table) > 0:
    lineHandle = RfcAppendNewRow(container, &errorInfo) 
    line = table[0]
    fillStructureField(typeDesc, lineHandle, name, value)
    # https://stackoverflow.com/questions/33626623/the-most-efficient-way-to-remove-first-n-elements-in-a-list
    del table[:1]
DaAitch commented 6 years ago

@bsrdjan in case that garbage collection may take place somewhere in the future you may not get rid of memory bottlenecks. I don't completely get your use case but if you not want "shrink" your array while iterating you could save an index, to know which parts not to copy.

Another point is to rethink your design. Big arrays mostly come from IO, like reading a huge file. The better NodeJS'ish answer to handle lot of data is streaming which means bottleneck stays IO and memory resources are small. As long as IO is the bottleneck, NAPI will not boost your application.

bsrdjan commented 6 years ago

My node addon is a wrapper around C/C++ library, which does not support streaming. The streaming would be the final and optimal solution and adding it requires more efforts, which are not planned for the time being. I try therefore to optimise what is near-term possible.

The big Napi::Array comes from the nodejs to addon, in one addon method call. The addon takes the first Napi::Array element and passes to external library, starting building an array copy there.

After loop over Napi::Array finished in addon, there are two copies of the array, one on JS heap and one on C/C++ heap of external library.

To avoid full duplication, I was thinking about removing each Napi::Array element, while iterating, after its copy created in C/C++.

In C/C++ to addon direction, the duplication is prevented by deleting C/C++ element, after its copy added to Napi::Array. To achieve the same in addon to C/C++ direction, I was looking for method like splice() and could not find.

The problem is referenced in SAP RFC: Chunking of big data and I wonder if this could be relevant for other addons, wrapping external libraries which do not support streaming?

DaAitch commented 6 years ago

@bsrdjan you could give it a try. Napi::Array is a Napi::Object and supports Delete

/// Deletes an indexed property or array element.
bool Delete(
  uint32_t index ///< Property / element index
);

See if this helps you.

bsrdjan commented 6 years ago

Thanks for the hint. Tried and did not get expected result. Here the loop:

Napi::Array array = value.As<Napi::Array>();
uint32_t rowCount = array.Length();

for (uint32_t i = 0; i < rowCount; i++)
{
    RFC_STRUCTURE_HANDLE structHandle = RfcAppendNewRow(tableHandle, &errorInfo);
    Napi::Value rv = fillStructure(structHandle, functionDescHandle, cName, array.Get(i).As<Napi::Object>());

    bool b = array.Delete(i);
    uint32_t j = array.Length();
    printf("\n%s %d", b ? "true" : "false", j);

    if (!rv.IsUndefined())
        return rv;
}

and the output, for the input Napi::Value array of length 10:

true 10
true 10
true 10
true 10
true 10
true 10
true 10
true 10
true 10
true 10

The Delete() returns true and elements are definitely not deleted. Any thoughts how that could be?

Assuming it works, should there be any differences in performance depending on whether the first or the last element deleted? My external lib is slow (with large array) when the first element removed, because all others must be shifted. Removing the last one is always fast. Is Delete() sensitive in that regard?

DaAitch commented 6 years ago

@bsrdjan

The Delete() returns true and elements are definitely not deleted. Any thoughts how that could be?

true only states that deleting works. Immutable structures would return false.

Assuming it works, should there be any differences in performance depending on whether the first or the last element deleted?

I would say it really depends on the engine. Both pop or shift are not expensive and it's not very predictable what performs better. If you want to remove many items at once e.g. 100, I think it's better to use them from back and do it like this arr.length -= 100 (yes JS is sometimes crazy). But it's not possible with NAPI.

Anyway, don't focus on array length, it should be no problem. Focus on getting the huge objects inside recycled, by making their ref counters zero.

My external lib is slow (with large array) when the first element removed, because all others must be shifted. Removing the last one is always fast. Is Delete() sensitive in that regard?

Although your example not seems to work, it actually does. Deleting items will not shrink your array, but ref counters of the items are decremented, so they might get recycled.

You should check memory at runtime, e.g. with heapUsed of process.memoryUsage().

bsrdjan commented 6 years ago

Thanks, can confirm it works.