planetscale / vtprotobuf

A Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2
BSD 3-Clause "New" or "Revised" License
912 stars 78 forks source link

ResetVT does not work with oneof fields #144

Open codesoap opened 2 months ago

codesoap commented 2 months ago

Problem

When using pools, the ResetVT method is able to reuse allocated memory of bytes. It looks like this with a minimal example:

syntax = "proto2";
package TEST;
option go_package = "./testpkg";

message Foo {
    required bytes raw = 1;
}
func (m *Foo) ResetVT() {
    if m != nil {
        f0 := m.Raw[:0]
        m.Reset()
        m.Raw = f0
    }
}

When the bytes are inside a oneof field, the memory is no longer reused:

syntax = "proto2";
package TEST;
option go_package = "./testpkg";

message Foo {
    oneof data {
        bytes raw = 1;
        bytes zlib_data = 2;
    }
}
func (m *Foo) ResetVT() {
    if m != nil {
        m.Reset()
    }
}

Use Case

This is a real problem for me when parsing open streetmap's PBF files, because they use this construct for large blobs of data that occur hundreds to thousands of times in a PBF file: https://github.com/openstreetmap/OSM-binary/blob/65e7e976f5c8e47f057a0d921639ea8e6309ef06/osmpbf/fileformat.proto#L38

Maybe it would have been better if they used a single bytes field and a separate type field, but I'm afraid the design is set in stone as the format is already widely used.

Suggested Goal

Unfortunately I don't understand the code well enough to provide a pull request, but this is how I would imagine the generated code to look like in order to fix the problem:

func (m *Foo) ResetVT() {
    if m != nil {
        var f0 isFoo_Data
        switch t := m.Data.(type) {
        case *Foo_Raw:
            t.Raw = t.Raw[:0]
            f0 = t
        case *Foo_ZlibData:
            t.ZlibData = t.ZlibData[:0]
            f0 = t
        }
        m.Reset()
        m.Data = f0
    }
}

Inside the UnmarshalVT method, the code would look something like this; As a first step I'm only reusing the memory, if the Data field previously had the same type:

...
            var v []byte
            if m.Data == nil {
                v = make([]byte, postIndex-iNdEx)
            } else {
                switch t := m.Data.(type) {
                case *Foo_Raw:
                    if cap(t.Raw) < postIndex-iNdEx {
                        v = make([]byte, postIndex-iNdEx)
                    } else {
                        v = t.Raw[:postIndex-iNdEx]
                    }
                default:
                    v = make([]byte, postIndex-iNdEx)
                }
            }
            copy(v, dAtA[iNdEx:postIndex])
            m.Data = &Foo_Raw{Raw: v}
...
vmg commented 1 month ago

This suggested generated code seems reasonable to me, but I do feel like it's gonna be tough to implement. We can merge this improvement if you get it working, and I think you're in the right direction. The right way to tackle this is starting with the generated code you're expecting to see and then tweaking the codegen until you get the desired results. :)

codesoap commented 1 month ago

I'm afraid my pain is not high enough to attempt implementing the feature at the moment. Maybe I will find more motivation in the future. I'll leave it to you to close this issue for now or keep it open.