hashgraph / pbj

A performance optimized Google Protocol Buffers code generator, parser, and Gradle module.
Apache License 2.0
22 stars 6 forks source link

Add direct serialization to MessageDigest hashing #271

Open jasperpotts opened 2 months ago

jasperpotts commented 2 months ago

Problem

Today we have to serialize to a byte array before we can hash a PBJ protobuf object. This is wasteful in time and more importantly heap object garbage. For example we do:

EventCore.PROTOBUF.toBytes(event.getEventCore()).writeTo(eventDigest)

Solution

Add an overloaded write(MessageDigest digest) method to Codec interface and generated implementations. It can call the normal write(T o, WritableSequentialData out) use a custom implementation of WritableSequentialData that wraps a java.security.MessageDigest mapping the write(byte) and write(byte[]) methods to update(byte) and update(byte[]) methods on MessageDigest. This will avoid the creation of temporary byte arrays and ByteArrayOutputStreams.

Alternatives

No response

jasperpotts commented 2 months ago

After this is fixed, update PbjBytesHasher to use it.

lpetrovic05 commented 2 months ago

this is currently implemented with existing code:

private final MessageDigest eventDigest = DigestType.SHA_384.buildDigest();
private final WritableSequentialData eventStream = new WritableStreamingData(new HashingOutputStream(eventDigest));

EventCore.PROTOBUF.write(event.getEventCore(), eventStream);

Do you think there is any benefit in adding this new method? Other than some convenience?

jasperpotts commented 2 months ago

Probably fine, don't think there is a performance impact from first look other than extra object created. Adding the PBJ method would make code cleaner and give single point to performance test and improve over time. We know that PBJ serialization is expensive and if there is a more efficient way to hash a PBJ object in the future then that would be nice if there was only a couple APIs where we know it is done and they are used everywhere.