oschwald / maxminddb-golang

MaxMind DB Reader for Go
ISC License
576 stars 99 forks source link

Improve decoding performance #92

Open vincentbernat opened 1 year ago

vincentbernat commented 1 year ago

Hey!

I am looking at improving performance on an app using heavily geoip lookup and I am wondering if there was something between using reflect and a custom deserializer (as introduced in #74). reflect can be slow and it should be possible to use something like DecodeFieldAsString(offset uintptr, fields ...string) string.

I didn't try to use the deserializer interface as it is not really documented. An alternative would be an example on how to use it. The associated tests are strictly technical, so it is hard to understand how difficult this is.

oschwald commented 1 year ago

For my use cases, decoding to a minimal struct that only has the fields I am interested in has been fast enough. Have you tried this already? It still involves reflection, but less of it and most of the time ends up being spent in doing the lookup in the binary tree.

As far as the deserializer interface goes, there is an example of how it is used in mmdbwriter. This sounds like it is likely a bit heavier than what you want here.

Similarly, there is this recent PR. I haven't had a chance to really look at it yet. From glancing at it, it seems basically equivalent to the deserializer; I am not completely sure what it adds and I am a bit reluctant to expand the public API that much.

As far as your particular suggestion goes, it sounds like you want something similar to the data lookup functions in libmaxminddb. I am not opposed to that necessarily, although I am reluctant to add a method per data type. Perhaps something could be done with generics, although it is hard to see how that would work without any reflection.

vincentbernat commented 1 year ago

For my use cases, decoding to a minimal struct that only has the fields I am interested in has been fast enough. Have you tried this already? It still involves reflection, but less of it and most of the time ends up being spent in doing the lookup in the binary tree.

I have done that earlier today. I have yet to test how much it helped with real data. I am confident it will (Country in geoip2-golang has many fields and I needed only one, ASN which I was also using didn't appear in the profile).

As far as the deserializer interface goes, there is an example of how it is used in mmdbwriter https://github.com/maxmind/mmdbwriter/blob/main/deserializer.go. This sounds like it is likely a bit heavier than what you want here.

Yes, this sounds very scary. But maybe because, like for tests, it is generic. I think you could make many of the methods panic if you only wanted to get only a few specific fields.

Similarly, there is this recent PR https://github.com/oschwald/maxminddb-golang/pull/91. I haven't had a chance to really look at it yet. From glancing at it, it seems basically equivalent to the deserializer; I am not completely sure what it adds and I am a bit reluctant to expand the public API that much.

I have also looked a bit and I reached the same conclusion as you.

As far as your particular suggestion goes, it sounds like you want something similar to the data lookup functions in |libmaxminddb| https://github.com/maxmind/libmaxminddb/blob/main/doc/libmaxminddb.md#data-lookup-functions. I am not opposed to that necessarily, although I am reluctant to add a method per data type. Perhaps something could be done with generics, although it is hard to see how that would work without any reflection.

I understand.

Another option would be to compile a struct to a function that would do the decoding. Easier said than done.

damz commented 1 year ago

I am looking at improving performance on an app using heavily geoip lookup and I am wondering if there was something between using reflect and a custom deserializer (as introduced in #74). reflect can be slow and it should be possible to use something like DecodeFieldAsString(offset uintptr, fields ...string) string.

@vincentbernat My PR #91 gives you pretty much that.

We are using this to decode data stored in a maxmind database to protobuf-defined structs.

From a protobuf message like this:

message ASN {
    uint32 number = 1;
    string name = 2;
}

Our code generator generates:

type ASN struct {
    // [...]

    Number uint32 `protobuf:"varint,1,opt,name=number,proto3" json:"number,omitempty"`
    Name   string `protobuf:"bytes,2,opt,name=name,proto3" json:"name,omitempty"`
}

func (m *ASN) DecodeMaxMind(d *maxmind.Decoder) error {
    return d.DecodeMap(func(key string, valueDecoder *maxmind.Decoder) error {
        switch key {
        case "number":
            {
                var err error
                m.Number, err = valueDecoder.DecodeUInt32()
                if err != nil {
                    return err
                }
            }
        case "name":
            {
                var err error
                m.Name, err = valueDecoder.DecodeString()
                if err != nil {
                    return err
                }
            }
        }
        return nil
    })
}

The implementation supports all the underlying types in a reasonable way (i.e. scalars decode to their corresponding Go types, maps and structs you can iterate on), and is fully tested.

vincentbernat commented 1 year ago

@damz Oh, understood! Yes, this interface seems simpler than the other one.

OTOH, after providing my own structures with far less fields, the performance using reflect is good enough for me as maxminddb is now a very minor contributor in the pprof profile.