I'm running code which is pushing a total of over 1GB of data into a mapDB. The largest item in the sequence is roughly 106mb. The specific method for pushing data is BTreeMap<String, byte[]>.put().
In the execution chain that follows, the DataOutput2.grow() (called by ensureAvail(), which is called by write()) method is called to ensure enough buffer space is set aside to store the data. When the data sequence grows (split between multiple calls to put()) and the buffer (DataOutput2.buf) size grows to over 1GB, or 2^30 bytes, the next power of 2 is computed (see immediately following), it results in an overflowed signed int which is negative and is therefore not taken as the newSize value to grow a new buffer with.
The code to the DataOutput2 grow() function for context:
When the call fails, using IntelliJ debugger I find the inputs and state values are as follows:
n: 1,073,807,719
buf.length: 1,073,741,824
and at the end of the call, newSize is 1,073,741,824, which was already the size of buf, and which is not large enough to accomodate the desired new size of n.
Since n is larger than buf.length, the expression DataIO.nextPowTwo(n) should give the next power of 2 (2,147,483,648) however due to DataIO.NextPowTwo producing an int, it can't produce a value as large as 2,147,483,648, because the bits that would do so actually represent a negative value (-2147483648). I guess we are hitting the limits of buffer size that can be used here.
sizeMask was and remains the following bit sequence: 11000000000000000000000000000000 (I'm assuming the intended size computation is to have it flip one more bit to become 10000000000000000000000000000000 and thus accomodate 2^31 elements however it never got the chance to compute this because of the overflow in the NextPowTwo result.
Is it user error to send enough byte buffers to be stored that ends up overflowing the size calculation, or should some prior step in the BTreeMap<String, byte[]>.put() chain have divided the underlying set of buffers into smaller subbuffers with lengths that would fit inside a regular Java int?
I believe that the grow() method should prevent invalid values from being passed into it, or even better, no buffer should be used that exceeds 2^31-1 in size, and if buffers up to that size are to be permitted, the code needs to be aware that 2^31 is not possible with a signed int.
Here is the exception and stack trace as it occurs in my code (I cut off the bottom of the stack trace, which is my code calling mapDB):
java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 1073807719 out of bounds for byte[1073741824]
at java.base/java.lang.System.arraycopy(Native Method)
at org.mapdb.DataOutput2.write(DataOutput2.java:65)
at org.mapdb.DataOutput2.write(DataOutput2.java:58)
at org.mapdb.serializer.SerializerByteArray.serialize(SerializerByteArray.java:20)
at org.mapdb.serializer.SerializerByteArray.serialize(SerializerByteArray.java:13)
at org.mapdb.serializer.SerializerByteArray.valueArraySerialize(SerializerByteArray.java:76)
at org.mapdb.BTreeMapJava$NodeSerializer.serialize(BTreeMapJava.java:171)
at org.mapdb.BTreeMapJava$NodeSerializer.serialize(BTreeMapJava.java:136)
at org.mapdb.StoreDirectAbstract.serialize(StoreDirectAbstract.kt:245)
at org.mapdb.StoreDirect.update(StoreDirect.kt:632)
at org.mapdb.BTreeMap.put2(BTreeMap.kt:408)
at org.mapdb.BTreeMap.put(BTreeMap.kt:292)
Please let me know if I can provide additional information, or if I can assist in resolving this. I was initially hoping to try to solve this myself, but I'm unfamiliar with the overall design and I think someone familiar with this should be aware of the problem.
Using Java 11.43.55, and mapDB 3.1.0 or 3.0.10.
I'm running code which is pushing a total of over 1GB of data into a mapDB. The largest item in the sequence is roughly 106mb. The specific method for pushing data is
BTreeMap<String, byte[]>.put()
.In the execution chain that follows, the
DataOutput2.grow()
(called byensureAvail()
, which is called bywrite()
) method is called to ensure enough buffer space is set aside to store the data. When the data sequence grows (split between multiple calls toput()
) and the buffer (DataOutput2.buf
) size grows to over 1GB, or 2^30 bytes, the next power of 2 is computed (see immediately following), it results in an overflowed signed int which is negative and is therefore not taken as thenewSize
value to grow a new buffer with.The code to the DataOutput2 grow() function for context:
When the call fails, using IntelliJ debugger I find the inputs and state values are as follows:
n
: 1,073,807,719buf.length
: 1,073,741,824and at the end of the call,
newSize
is 1,073,741,824, which was already the size ofbuf
, and which is not large enough to accomodate the desired new size ofn
.Since
n
is larger thanbuf.length
, the expressionDataIO.nextPowTwo(n)
should give the next power of 2 (2,147,483,648) however due toDataIO.NextPowTwo
producing anint
, it can't produce a value as large as 2,147,483,648, because the bits that would do so actually represent a negative value (-2147483648). I guess we are hitting the limits of buffer size that can be used here.sizeMask
was and remains the following bit sequence: 11000000000000000000000000000000 (I'm assuming the intended size computation is to have it flip one more bit to become 10000000000000000000000000000000 and thus accomodate 2^31 elements however it never got the chance to compute this because of the overflow in the NextPowTwo result.Is it user error to send enough byte buffers to be stored that ends up overflowing the size calculation, or should some prior step in the BTreeMap<String, byte[]>.put() chain have divided the underlying set of buffers into smaller subbuffers with lengths that would fit inside a regular Java int?
I believe that the grow() method should prevent invalid values from being passed into it, or even better, no buffer should be used that exceeds 2^31-1 in size, and if buffers up to that size are to be permitted, the code needs to be aware that 2^31 is not possible with a signed
int
.Here is the exception and stack trace as it occurs in my code (I cut off the bottom of the stack trace, which is my code calling mapDB):
Please let me know if I can provide additional information, or if I can assist in resolving this. I was initially hoping to try to solve this myself, but I'm unfamiliar with the overall design and I think someone familiar with this should be aware of the problem.