labstack / echo

High performance, minimalist Go web framework
https://echo.labstack.com
MIT License
29.01k stars 2.21k forks source link

proposal: Micro-optimization for better memory utilization #2632

Closed behnambm closed 1 month ago

behnambm commented 2 months ago

Hello,

I've noticed that there's a potential for optimizing certain structs by rearranging their fields. This optimization could notably reduce the padding added by the compiler, ensuring better memory utilization and potentially enhancing performance.

It's important to consider that alignment requirements vary between 32-bit and 64-bit architectures. With a focus on optimizing for 64-bit architecture, I believe there's significant room for improvement in this regard.

It's worth emphasizing that this optimization won't alter the program's logic; rather, it involves solely changing the location of fields within the structs.

If okay, I would like to work on this.

aldas commented 2 months ago

Sorry for the late reply. If you are have interest and time for it please go ahead.

Could you provide small summary of which structs you aim, maybe small example with benchmark. Maybe related links to (blog)posts about this kind of optimizations - so people that are not that knowledgeable could get some introductory into the matter.

If there are no performance gains it would be useful to show that and reason why we see numbers we see. This is useful less experienced coders.

I am a quite ignorant at these things, I sometimes look at micro-optimizations that other frameworks do and think how much this magic actually give them benefits. It does not mean we should not do them but to have honest conversation what the benefits are in numbers.

I would really like if these kind of things come in a form that everyone interested (even not Echo users) could learn something. Having more optimized code is good, but even better is having that and more learned people for Go ecosystem.

behnambm commented 1 month ago

Absolutely, sharing knowledge is key here, and I'm fully on board with that.


So let me start with a simple exlanation of what this micro optimization is.

Here we have two important concepts:

  1. Word Size: This refers to the number of bytes the CPU can process at one time. On a 64-bit system, the word size is 8 bytes, meaning the CPU can handle 8 bytes of data in a single operation. Ref
  2. Alignment: For the CPU to process data most efficiently, the data needs to be aligned in memory according to the word size. This means that data types should ideally begin at memory addresses that are multiples of their size, up to the word size. Ref

When struct fields are not naturally aligned with these boundaries, the Go compiler will automatically insert padding (extra space) between fields. This padding ensures that each field starts at an address that aligns with its size, optimizing memory access during runtime. And as a result when a data type is naturally aligned, the CPU fetches it in minimum read cycles.

Look at this struct:

type myStruct struct {
 myBool  bool    // 1 byte
 myFloat float64 // 8 bytes
 myInt   int32   // 4 bytes
}

And this is how the memory allocation for this struct will look like: image

The red space is the padding added by the compiler to complete a word.

If we just reorder the fields of that struct like this:

type myStructOptimized struct {
 myFloat float64 // 8 bytes
 myBool  bool    // 1 byte
 myInt   int32   // 4 bytes
}

We will get an allocation like this: image

And finally with just a reorder we can save 8 bytes.

Here's the code if you want to run it locally:

package main

import (
    "fmt"
    "unsafe"
)

type myStruct struct {
    myBool  bool    // 1 byte
    myFloat float64 // 8 bytes
    myInt   int32   // 4 bytes
}

type myStructOptimized struct {
    myFloat float64 // 8 bytes
    myBool  bool    // 1 byte
    myInt   int32   // 4 bytes
}

func main() {
    a := myStruct{}
    b := myStructOptimized{}

    fmt.Println("Size of myStruct:", unsafe.Sizeof(a))          //24
    fmt.Println("Size of myStructOptimized:", unsafe.Sizeof(b)) //16

}

References: Golang struct size and memory optimisation Alignment Structure Matters: Padding, Alignment, and Memory Optimization in Go Optimize Your Struct with Alignment Trick + Great Visualization How to Speed Up Your Struct in Golang

behnambm commented 1 month ago

Benchmark

I've been attempting to fine-tune the Echo framework to incorporate new benchmarks over the past few days. However, I encountered challenges due to the complexity of duplicating code required to establish a suitable benchmark environment. As a result, I decided to forgo benchmarking directly within the Echo framework itself.

I will drop this benchmark code here so you can run it locally too.

main.go

package main

type OptimizedUserStruct struct {
    Email    string // 16 bytes
    Balance  int64  // 8 bytes
    Rank     int32  // 4 bytes
    Age      uint16 // 2 bytes
    IsAdmin  bool   // 1 byte
    IsActive bool   // 1 byte
}

type UserStruct struct {
    IsAdmin  bool   // 1 byte + 7 padding
    Email    string // 16 bytes
    IsActive bool   // 1 byte + 7 padding
    Balance  int64  // 8 bytes
    Age      uint16 // 2 bytes + 2 padding
    Rank     int32  // 4 bytes
}

main_test.go

package main

import (
    "testing"
)

var OptimizedStructList []OptimizedUserStruct
var StructList []UserStruct

func BenchmarkUserStruct(b *testing.B) {
    b.ReportAllocs()
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        tmp := UserStruct{
            IsAdmin:  true,
            Email:    "email@example.com",
            IsActive: true,
            Balance:  42,
            Age:      42,
            Rank:     42,
        }
        StructList = append(StructList, tmp)
    }
}

func BenchmarkOptimizedUserStruct(b *testing.B) {
    b.ReportAllocs()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        tmp := OptimizedUserStruct{
            Email:    "email@example.com",
            Balance:  42,
            Rank:     42,
            Age:      42,
            IsAdmin:  true,
            IsActive: true,
        }
        OptimizedStructList = append(OptimizedStructList, tmp)
    }
}

Result

go test -bench=. ./...

goos: linux
goarch: amd64
pkg: lab/padding
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkUserStruct-8                   10656973               109.0 ns/op           270 B/op          0 allocs/op
BenchmarkOptimizedUserStruct-8          46414730                53.81 ns/op          162 B/op          0 allocs/op
PASS
ok      lab/padding     4.164s

It's crucial to emphasize the title of the issue, which proposes a "micro" optimization. Therefore, we shouldn't anticipate a significant impact on performance. The effects of these types of optimizations become apparent at scale.

aldas commented 1 month ago

Thank you @behnambm !