vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.76k stars 196 forks source link

Document that only trusted data should be deserialized #270

Open Marcono1234 opened 2 years ago

Marcono1234 commented 2 years ago

It appears many of the classes which implement Serializable allocate memory based on a size value specified in the deserialized data, e.g. read an int as size value and create an array of that size. I assume this is done intentionally for good performance; however, it should be documented. Adversaries can abuse this to cause Denial of Service attacks. For example the Guava library received CVE-2018-10237 for the same issue. (I just want to point out here that this can be considered a security issue; I am not asking for fastutil to get a CVE or to change the implementation.)

It would therefore be good to at least in the README and the Javadoc main page describe that the deserialization implementation is designed for efficiency and should not be used for untrusted data.

vigna commented 2 years ago

OK, but most users would not even understand such a specific comment. Moreover, the advisor talks about "reasonable" sizes and fastutil is designed for very large data (e.g., BigArrayBigList), so it is not clear to me what the solution could be. You can always inject perfectly valid data that is too big.

Marcono1234 commented 2 years ago

You can always inject perfectly valid data that is too big.

Yes, that is not my concern. Serialization data which legitimately contains GBs of data is perfectly fine. It is the responsibility of the user then to limit the size of serialization data. My concern is that fastutil classes, such as IntArrayList allocate memory eagerly based on the encoded collection size, see related linked Guava CVE. Due to this, an adversary can with a few KBs of serialization data make fastutil allocate GBs of data before it even notices that the serialization data does not actually contain that many collection elements. For example:

/*
// Uses https://github.com/Marcono1234/serial-builder
byte[] serialData = SimpleSerialBuilder.startSerializableObject()
    .beginClassData("it.unimi.dsi.fastutil.ints.IntArrayList", -7046029254386353130L)
        .primitiveIntField("size", Integer.MAX_VALUE - 10)
        .writeObjectWith(writer -> {
        })
    .endClassData()
.endObject();
*/
byte[] serialData = {
    -84, -19, 0, 5, 115, 114, 0, 39, 105, 116, 46, 117, 110, 105, 109, 105, 46, 100, 115,
    105, 46, 102, 97, 115, 116, 117, 116, 105, 108, 46, 105, 110, 116, 115, 46, 73, 110,
    116, 65, 114, 114, 97, 121, 76, 105, 115, 116, -98, 55, 121, -71, 127, 74, 124, 22,
    3, 0, 1, 73, 0, 4, 115, 105, 122, 101, 120, 112, 127, -1, -1, -11, 120
};

new ObjectInputStream(new ByteArrayInputStream(serialData)).readObject();

Here the serialization data claims the size of the IntArrayList is Integer.MAX_VALUE - 10, and fastutil happily allocates an array of that size (causing OutOfMemoryError: Java heap space), before it would even notice that the serialization data does not actually contain any list elements.

This might be acceptable, given that the primary goal of this library is probably performance, and not pre-sizing the array would not be as efficient. However, users should be informed that the should not blindly deserialize classes of this library.

Alternatively you can of course implement deserialization in a safer way; this would also protect users which are not explicitly deserializing fastutil classes, but where fastutil is on the classpath, and therefore an adversary can reference its classes in the serialization data. But I cannot really suggest what "reasonable" initial collection sizes are, and getting this right for all corner cases is tricky.