Investigate Improvements to NatsMsg<T> struct size

to11mtm commented 10 months ago

Proposed change

In #276, the size of NatsMsg<T> came up as a concern for large buffer sizes. @mtmk asked the question:

as a side note, do you all think we should reconsider having NatsMsg and NatJSMsg as value types? initially I thought it'd be good to avoid object creation and GC as much as possible but since they are relatively large, are we doing more harm that good trying to relieve potential GC pressure?

The zero-allocation case is worthy (As well as keeping our APIs stable,) so it may be worth investigating opportunities to save on the size of the struct, while only paying a (smaller) allocation penalty for additional features.

Essentially, what we can do is, rather than have separate ReplyTo, Headers, and Connection properties, use a single private object _data field that acts as a form of DU; If only one of the above is needed, it is placed directly in the field; if the additional fields are needed the appropriate container object(s) are allocated. The getters and initters of course will have to deal with the behavior of the DU (i.e. as-type-checking to get to the appropriate result), but that shaves off 16b from the struct (at a cost of up to 16-24b+header alloced for unhappy cases)

Use case

The big win for the trouble is a 33% reduction in size of our struct (Based on showing 48 bytes for NatsMsg<string> via ObjectLayoutInspector). Our new struct size of 32b is a nice power of two as well, so we're also less likely to have to cross cache lines on writes (also, now 2 to a line instead of 1.? to a line) and reads to BoundedChannel<T>'s underlying buffer.

The potential downsides:

As-checking may have detrimental impacts on performance; a. I don't think this is a huge issue, the BCL does similar things frequently enough for structs
Using more than one field now results in allocation. a. It's likely that for everyone's sake, it is better for us to have just one 'union' value which would mean each message using more than one 'extra' field will now cause allocation.

Contribution

I am happy to contribute, including building a proof of concept branch if it will aid in helping decide whether this proposal is viable.

caleblloyd commented 10 months ago

Potentially another piece of savings:

I know it'd be a breaking API change - but do we really need Size? Seems like that is maybe left over from an earlier iteration of the struct, not sure why the end-user needs to know the raw message Size. The Library can calculate it where needed, doesn't need to be on the NatsMsg<T> I think

to11mtm commented 10 months ago

Potentially another piece of savings:

I know it'd be a breaking API change - but do we really need Size? Seems like that is maybe left over from an earlier iteration of the struct, not sure why the end-user needs to know the raw message Size. The Library can calculate it where needed, doesn't need to be on the NatsMsg<T> I think

I can think of reasons but I'm not a normal end user 😅.

Getting rid of Size would get us back 8 bytes for a reference T NatsMsg<T>. However, then we aren't as nicely aligned on 64B/128B cache lines; If we made such a change we'd likely want to investigate if keeping NatsConnection as a direct field on the struct (since that appears to be provided in almost, if not every call to the Build method or constructor) would be better; especially because it would then, we would only allocate if ReplyTo and Headers are provided, and even then just 16B+objHeader.

mtmk commented 10 months ago

I know it'd be a breaking API change - but do we really need Size?

I think this is the worst type of breaking change where you remove a feature. We might get away with it because it's early days.

caleblloyd commented 10 months ago

I guess I can see the benefits of Size in certain cases. But the current implementation only sets a Size on incoming messages. What about making a method that could compute it?

INatsConnection {
  int GetMsgSize(NatsMsg<T> msg, INatsSerialize<T>? serializer)
}

As an optimization could maintain a look-up table of the last N (maybe 1000?) messages that were received on the connection, so calling GetMsgSize on one received recently could return quickly

mtmk commented 10 months ago

Interesting idea!

What about making a method that could compute it?

How? serialize previously deserialized message data? this is not always invertible.

optimization could maintain a look-up table

How do we identify a given message?

...just trying to understand 😃

mtmk commented 10 months ago

Just a note on the message size: it's there because we needed to check the total size for pull requests when consuming.

Also, what about connection? Can't we have something like INatsConnection.ReplyTo(msg) instead of carrying reference around?

caleblloyd commented 10 months ago

How? serialize previously deserialized message data? this is not always invertible.

Yes, that's what we would have to do. True, we don't track the Deserializer (which may not even have complementing Serializer) so you'd either have to supply the matching Serializer that tyou used, or get the connection default

How do we identify a given message?

Wouldn't be trivial. We'd have to use some sort of look-up on a hash of the Data object or something, and we could easily have misses in that case.

Also, what about connection? Can't we have something like INatsConnection.ReplyTo(msg) instead of carrying reference around?

I guess we could, but then we'd lose the convenience of being able to do msg.ReplyTo

Just a note on the message size: it's there because we needed to check the total size for pull requests when consuming.

Maybe we should just leave it then, and note that it's only automatically set when a message is received. Same with Connection.

to11mtm commented 10 months ago

FWIW, This is the current Object Layouts, based on the struct as of 5f9dc5a on x64:

`Int32`

Type layout for 'NatsMsg`1'
Size: 40 bytes. Paddings: 0 bytes (%0 of empty space)
|==============================================================|
|   0-7: String <Subject>k__BackingField (8 bytes)             |
|--------------------------------------------------------------|
|  8-15: String <ReplyTo>k__BackingField (8 bytes)             |
|--------------------------------------------------------------|
| 16-23: NatsHeaders <Headers>k__BackingField (8 bytes)        |
|--------------------------------------------------------------|
| 24-31: INatsConnection <Connection>k__BackingField (8 bytes) |
|--------------------------------------------------------------|
| 32-35: Int32 <Size>k__BackingField (4 bytes)                 |
|--------------------------------------------------------------|
| 36-39: Int32 <Data>k__BackingField (4 bytes)                 |
|==============================================================|

`String` (Or any other Reference or 8 byte struct)

Type layout for 'NatsMsg`1'                                     
Size: 48 bytes. Paddings: 4 bytes (%8 of empty space)           
|==============================================================|
|   0-7: String <Subject>k__BackingField (8 bytes)             |
|--------------------------------------------------------------|
|  8-15: String <ReplyTo>k__BackingField (8 bytes)             |
|--------------------------------------------------------------|
| 16-23: NatsHeaders <Headers>k__BackingField (8 bytes)        |
|--------------------------------------------------------------|
| 24-31: String <Data>k__BackingField (8 bytes)                |
|--------------------------------------------------------------|
| 32-39: INatsConnection <Connection>k__BackingField (8 bytes) |
|--------------------------------------------------------------|
| 40-43: Int32 <Size>k__BackingField (4 bytes)                 |
|--------------------------------------------------------------|
| 44-47: padding (4 bytes)                                     |
|==============================================================|

`NatsMemoryOwner<Byte>`

Type layout for 'NatsMsg`1'
Size: 64 bytes. Paddings: 4 bytes (%6 of empty space)
|==============================================================|
|   0-7: String <Subject>k__BackingField (8 bytes)             |
|--------------------------------------------------------------|
|  8-15: String <ReplyTo>k__BackingField (8 bytes)             |
|--------------------------------------------------------------|
| 16-23: NatsHeaders <Headers>k__BackingField (8 bytes)        |
|--------------------------------------------------------------|
| 24-31: INatsConnection <Connection>k__BackingField (8 bytes) |
|--------------------------------------------------------------|
| 32-35: Int32 <Size>k__BackingField (4 bytes)                 |
|--------------------------------------------------------------|
| 36-39: padding (4 bytes)                                     |
|--------------------------------------------------------------|
| 40-63: NatsMemoryOwner`1 <Data>k__BackingField (24 bytes)    |
| |====================================|                       |
| |   0-7: ArrayPool`1 _pool (8 bytes) |                       |
| |------------------------------------|                       |
| |  8-15: Byte[] _array (8 bytes)     |                       |
| |------------------------------------|                       |
| | 16-19: Int32 _start (4 bytes)      |                       |
| |------------------------------------|                       |
| | 20-23: Int32 _length (4 bytes)     |                       |
| |====================================|                       |
|==============================================================|

to11mtm commented 10 months ago

I did just have a possibly -terrifying- idea... Still need to check if this works with how Jetstream uses things but...

Idea being, We could use a Wrapper around ChannelReader to only provide the final NatsMsg<T> including the connection on the actual read from the channel. Here's a simple POC of the Wrapper.

This wouldn't necessarily shrink the final size of NatsMsg<T>, however it would let us use a smaller structure in our channel, which would help with the largest concerns as far as overall size and poor cache line alignment in the internal buffers.

Edit: Looks Like NatsJSMsg<T> would work pretty well with this change, Possibly even better because the connection is already on the NatsJSContext, we wouldn't even need to wrap the channel.

caleblloyd commented 10 months ago

I guess if we take it a step further, we could also wait to call NatsMsg.Build until the message is read off the channel, meaning de-serialization would happen at that point also. I think for status checking such as No Responders that we actually would want to de-serailize the Headers prior to putting in the channel, but that means we have to track size also:

        string subject,
        string? replyTo,
        NatsHeaders? headers,
        ReadOnlySequence<byte> payloadData,
        int size,

One benefit I could see here is that de-serialization would happen outside of the Read Processing Loop, so it could make more use of more cores.

to11mtm commented 10 months ago

I guess if we take it a step further, we could also wait to call NatsMsg.Build until the message is read off the channel, meaning de-serialization would happen at that point also. I think for status checking such as No Responders that we actually would want to de-serailize the Headers prior to putting in the channel, but that means we have to track size also:
        string subject,
        string? replyTo,
        NatsHeaders? headers,
        ReadOnlySequence<byte> payloadData,
        int size,
One benefit I could see here is that de-serialization would happen outside of the Read Processing Loop, so it could make more use of more cores.

Two concerns with that:

Buffers get tied up in the meantime
ReadOnlySequence<byte> is 24 bytes on x64 so we wind up with a larger struct.

However I think I got some inspiration today... I'll update my gist over the weekend or just push a branch out for feedback.

nats-io / nats.net