optimalPartInfo() default partSize results in running out of memory

sb10 commented 7 years ago

Like #404, I'm streaming data which could be over 1GB and trying to use PutObjectStreaming(), but this results in optimalPartInfo(-1) being called and a partSize of 603979776 being returned.

The problem I face is that this partSize is used as, essentially, the size of the read buffer, and I'm running out of memory (I end up with errors like fork/exec ...: cannot allocate memory after the stream completes).

I've tried setting partSize = minPartSize in optimalPartInfo() when the input size is -1, and this solves my memory issue.

Regardless of exactly how it's done, can we get (or choose) a reasonable sized read buffer during PutObjectStreaming() so that we can stream in constant small amount of memory?

sb10 commented 7 years ago

(In my test doing minPartSize chunks also resulted in greater speed, so I'm not sure why larger sizes are considered more optimal?)

harshavardhana commented 7 years ago

You should always make sure that the size of the stream is known for AWS S3 APIs to work in optimal manner. If not we fallback to a buffering technique where maximum object chosen is 5TB (lack of knowing file size) - If you do know the file size wrap it around into a struct with Len() function then optimalPartInfo() will calculate the right value @sb10 .

So this is an expected behavior. Also the buffering technique only uses 576MB re-usable buffer are you saying you don't have 1GB > memory on the system where you are using this?

sb10 commented 7 years ago

@harshavardhana I don't know the size of the file beforehand; the use case here is creating data that is too big to fit on the local hard drive and instead directly streaming it to S3.

From my brief reading of the code a new ~600MB buffer is created for every call to PutObjectStreaming(), so if someone wanted to stream multiple files simultaneously they'd run out of memory even on large memory systems.

But I'm running out of memory streaming just 1 file on a 4GB system. According to free -h I go from 68MB used to 2.7GB used (~900MB available) during the stream of a 1GB file. I run out of memory during a subsequent unrelated call. This may be due to being forced to buffer the writes while waiting for minio-go to read, combined with the data generator holding an input buffer of 1GB in my test case. But this change to minio-go fixes the problem:

$ git diff upstream/master
diff --git a/api-put-object-common.go b/api-put-object-common.go
index 213fc21..fb26417 100644
--- a/api-put-object-common.go
+++ b/api-put-object-common.go
@@ -56,19 +56,23 @@ func isReadAt(reader io.Reader) (ok bool) {
 //  maxMultipartPutObjectSize - 5TiB
 //
 func optimalPartInfo(objectSize int64) (totalPartsCount int, partSize int64, lastPartSize int64, err error) {
-       // object size is '-1' set it to 5TiB.
-       if objectSize == -1 {
-               objectSize = maxMultipartPutObjectSize
-       }
        // object size is larger than supported maximum.
        if objectSize > maxMultipartPutObjectSize {
                err = ErrEntityTooLarge(objectSize, maxMultipartPutObjectSize, "", "")
                return
        }
-       // Use floats for part size for all calculations to avoid
-       // overflows during float64 to int64 conversions.
-       partSizeFlt := math.Ceil(float64(objectSize / maxPartsCount))
-       partSizeFlt = math.Ceil(partSizeFlt/minPartSize) * minPartSize
+       var partSizeFlt float64
+       if objectSize == -1 {
+               // object size is '-1' set it to 5TiB and make our part size the minimum
+               // size, so that our read buffer isn't unnecessarily large
+               partSizeFlt = minPartSize
+               objectSize = maxMultipartPutObjectSize
+       } else {
+               // Use floats for part size for all calculations to avoid
+               // overflows during float64 to int64 conversions.
+               partSizeFlt = math.Ceil(float64(objectSize / maxPartsCount))
+               partSizeFlt = math.Ceil(partSizeFlt/minPartSize) * minPartSize
+       }
        // Total parts count.
        totalPartsCount = int(math.Ceil(float64(objectSize) / partSizeFlt))
        // Part size.

Now, during the stream of a 1GB file I go from 68MB used to 1.3GB used (2.3GB available) during the stream, and don't run out of memory.

(If it makes any difference, I'm using io.Pipe() to connect the io.Writer of my data generation to the io.Reader of PutObjectStreaming().)

harshavardhana commented 7 years ago

@harshavardhana I don't know the size of the file beforehand; the use case here is creating data that is too big to fit on the local hard drive and instead directly streaming it to S3.

From my brief reading of the code a new ~600MB buffer is created for every call to PutObjectStreaming(), so if someone wanted to stream multiple files simultaneously they'd run out of memory even on large memory systems.

But I'm running out of memory streaming just 1 file on a 4GB system. According to free -h I go from 68MB used to 2.7GB used (~900MB available) during the stream of a 1GB file. I run out of memory during a subsequent unrelated call. This may be due to being forced to buffer the writes while waiting for minio-go to read, combined with the data generator holding an input buffer of 1GB in my test case. But this change to minio-go fixes the problem:

It is incorrect to assume a lower size just to make sure that it fits into your memory. I agree that the crash is incorrect and at max we should only be using 600MB max. Unfortunately on AWS S3 it is not possible to upload a file for which content length is not known. So we optimize based on available information to provide a best effort mechanism.

From my brief reading of the code a new ~600MB buffer is created for every call to PutObjectStreaming(), so if someone wanted to stream multiple files simultaneously they'd run out of memory even on large memory systems.

I see now you are right and it should be avoided, there should be a fixed buffer which should be re-used. The problem is caused by usage of bytes.Buffer which does 2x allocation growth. The code can be modified to have a capped behavior. On the other hand reducing the optimal part size is not acceptable solution.

harshavardhana commented 7 years ago

Once this PR is merged https://github.com/minio/minio-go/pull/728 will look at the memory problems of choosing a 600MB part size. Avoiding bytes.Buffer should solve this problem will keep you posted @sb10

sb10 commented 7 years ago

This is not fixed. It actually got worse (my workaround of calling debug.FreeOSMemory() before doing the next thing no longer works because now it fails directly in the minio-go function call). Tested with latest commit 1a09415eed6025360c2c3142f92d3bcf339f873d .

runtime: out of memory

runtime stack:
runtime.throw(0xb0009d, 0x16)
    /home/ubuntu/go/src/runtime/panic.go:596 +0x95
runtime.sysMap(0xc420d40000, 0x24000000, 0x0, 0xdda7d8)
    /home/ubuntu/go/src/runtime/mem_linux.go:216 +0x1d0
runtime.(*mheap).sysAlloc(0xdc0000, 0x24000000, 0x12000)
    /home/ubuntu/go/src/runtime/malloc.go:440 +0x34e
runtime.(*mheap).grow(0xdc0000, 0x12000, 0x0)
    /home/ubuntu/go/src/runtime/mheap.go:774 +0x71
runtime.(*mheap).allocSpanLocked(0xdc0000, 0x12000, 0xe)
    /home/ubuntu/go/src/runtime/mheap.go:678 +0x421
runtime.(*mheap).alloc_m(0xdc0000, 0x12000, 0x100000000, 0xffffffffffffffff)
    /home/ubuntu/go/src/runtime/mheap.go:562 +0xe2
runtime.(*mheap).alloc.func1()
    /home/ubuntu/go/src/runtime/mheap.go:627 +0x4b
runtime.systemstack(0x7f46357f9e00)
    /home/ubuntu/go/src/runtime/asm_amd64.s:343 +0xab
runtime.(*mheap).alloc(0xdc0000, 0x12000, 0x289010100000000, 0x7f463fbd5110)
    /home/ubuntu/go/src/runtime/mheap.go:628 +0xa0
runtime.largeAlloc(0x24000000, 0xc420282801, 0xc4202827e0)
    /home/ubuntu/go/src/runtime/malloc.go:807 +0x93
runtime.mallocgc.func1()
    /home/ubuntu/go/src/runtime/malloc.go:702 +0x3e
runtime.systemstack(0xc42001f300)
    /home/ubuntu/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
    /home/ubuntu/go/src/runtime/proc.go:1132

goroutine 3585 [running]:
runtime.systemstack_switch()
    /home/ubuntu/go/src/runtime/asm_amd64.s:281 fp=0xc420282720 sp=0xc420282718
runtime.mallocgc(0x24000000, 0xa52f80, 0xc420282801, 0x437098)
    /home/ubuntu/go/src/runtime/malloc.go:703 +0x753 fp=0xc4202827c8 sp=0xc420282720
runtime.makeslice(0xa52f80, 0x24000000, 0x24000000, 0x8000000, 0x0, 0x0)
    /home/ubuntu/go/src/runtime/slice.go:54 +0x7c fp=0xc420282818 sp=0xc4202827c8
github.com/minio/minio-go.glob..func1(0xdb04c0, 0x0)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-put-object-multipart.go:59 +0x7f fp=0xc420282880 sp=0xc420282818
sync.(*Pool).Get(0xdb04c0, 0xc4200145a0, 0xc420014570)
    /home/ubuntu/go/src/sync/pool.go:144 +0x105 fp=0xc4202828d8 sp=0xc420282880
github.com/minio/minio-go.Client.putObjectMultipartNoStream(0xc4206331c0, 0x5, 0x0, 0x0, 0x0, 0xc4206331c8, 0x10, 0x0, 0x0, 0x0, ...)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-put-object-multipart.go:110 +0x390 fp=0xc4202830a8 sp=0xc4202828d8
github.com/minio/minio-go.Client.putObjectMultipart(0xc4206331c0, 0x5, 0x0, 0x0, 0x0, 0xc4206331c8, 0x10, 0x0, 0x0, 0x0, ...)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-put-object-multipart.go:37 +0x10c fp=0xc420283378 sp=0xc4202830a8
github.com/minio/minio-go.Client.putObjectCommon(0xc4206331c0, 0x5, 0x0, 0x0, 0x0, 0xc4206331c8, 0x10, 0x0, 0x0, 0x0, ...)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-put-object.go:203 +0x305 fp=0xc420283610 sp=0xc420283378
github.com/minio/minio-go.Client.PutObjectWithProgress(0xc4206331c0, 0x5, 0x0, 0x0, 0x0, 0xc4206331c8, 0x10, 0x0, 0x0, 0x0, ...)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-put-object.go:184 +0x18f fp=0xc4202837a8 sp=0xc420283610
github.com/minio/minio-go.Client.PutObjectStreaming(0xc4206331c0, 0x5, 0x0, 0x0, 0x0, 0xc4206331c8, 0x10, 0x0, 0x0, 0x0, ...)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-put-object-streaming.go:31 +0xf8 fp=0xc420283938 sp=0xc4202837a8
github.com/[mycode].UploadData(0xc420016960, 0xd835e0, 0xc420078138, 0xc4202094e0, 0x13, 0x487703, 0x59888cac)
    /home/ubuntu/work/src/github.com/[mycode].go:290 +0x147 fp=0xc420283bb0 sp=0xc420283938
[...]

[..., lots and lots of:]

goroutine 1904 [select]:
github.com/minio/minio-go.Client.GetObject.func1(0xc42045b500, 0xc42045b5c0, 0xc42045be00, 0xc42067c920, 0xc4205928c0, 0xc42067c930, 0xc420618fc0, 0xc42001c070, 0x4, 0xc420221440, ...)
    /home/ubuntu/work/src/github.com/minio/minio-go/api-get-object.go:89 +0x1b18
created by github.com/minio/minio-go.Client.GetObject
    /home/ubuntu/work/src/github.com/minio/minio-go/api-get-object.go:233 +0x301

Where mycode line 290 is:

_, err := a.client.PutObjectStreaming(a.bucket, dest, data)

harshavardhana commented 7 years ago

This is not fixed. It actually got worse (my workaround of calling debug.FreeOSMemory() before doing the next thing no longer works because now it fails directly in the minio-go function call). Tested with latest commit 1a09415 .

The memory is protected by pooling the buffers you can't use debug.FreeOSMemory() which is a hack. The memory cannot be capped in the same way in traditional C programs. There is no way to optimize this on our end since AWS S3 API doesn't provide an API where content length is not provided it means we need to know the length of the we are going to upload and we have to upload the exact size, whatever we are doing right now in this library is sort of handling a scenario which no other AWS SDKs handle. We tried using temporary files but even that has challenges on systems with low on disk space.

I would suggest an architecture change on your end from what you are doing right now and not provide minio-go a stream with no length. If you do provide final length then we do know how much memory to allocate optimally. You can implement a custom io.Reader - which implements a Len() or Size() returns the size of the io.Reader underneath simplifies the behavior here.

sb10 commented 7 years ago

I'll look in to exactly what minio-go is doing tomorrow, but what stops you from uploading in user-defined or small chunks? If you only had a few 64MB buffers, I wouldn't run out of memory.

Is there a provable benefit to having buffers larger than this size?

harshavardhana commented 7 years ago

I'll look in to exactly what minio-go is doing tomorrow, but what stops you from uploading in user-defined or small chunks? If you only had a few 64MB buffers, I wouldn't run out of memory.

APIs need to be changed to address this and this would come across as a low-level API. Now if you do know that 64MB is good enough per chunk - then you are sort of aware of the final length of the object that you are going to upload. So i would suggest wrap the reader and implement Size() function which would then be used internally to calculate the optimal chunk.

If you wish to use low level APIs then you can simply use http://godoc.org/github.com/minio/minio-go#Core - here chunking and memory management is all yours - you can decide when to do multipart etc.

Is there a provable benefit to having buffers larger than this size?

The API needs to cater for all use cases, if it supports no-length io.Reader then it should support uploading till 5TB (maximum allowable object size for AWS S3) in such a case 500MB+ per part is the most optimal number based on the 10,000 part limit of AWS S3.

sb10 commented 7 years ago

I have no idea about the file size; files are being streamed to me externally.

In terms of catering for all use cases, how about limiting the size and number of buffers in the pool based on available memory? I feel like it's more useful to avoid crashing than it is to be able to upload a 5TB file. After all, you can't upload a 5TB file if you crash.

The problem I was having earlier was when I only had a few hundred MB of memory free on my machine. The very first attempt to create the ~600MB buffer failed.

But on a fresh 4GB machine with no other user processes running:

$ free -m
                  total        used        free      shared  buff/cache   available
Mem:           3951          47        3847           5          56        3746
Swap:             0           0           0

And having altered minio-go like this:

diff --git a/api-put-object-multipart.go b/api-put-object-multipart.go
index 6e0015a..1a4720a 100644
--- a/api-put-object-multipart.go
+++ b/api-put-object-multipart.go
@@ -24,6 +24,7 @@ import (
        "io/ioutil"
        "net/http"
        "net/url"
+       "log"
        "sort"
        "strconv"
        "strings"
@@ -56,7 +57,9 @@ func (c Client) putObjectMultipart(bucketName, objectName string, reader io.Read
 var bufPool = sync.Pool{
        New: func() interface{} {
                _, partSize, _, _ := optimalPartInfo(-1)
+               log.Printf("will create a new buffer of size %d\n", partSize)
                b := make([]byte, partSize)
+               log.Println("success!")
                return &b
        },
 }
diff --git a/api-put-object-streaming.go b/api-put-object-streaming.go
index 0d4639e..041fae4 100644
--- a/api-put-object-streaming.go
+++ b/api-put-object-streaming.go
@@ -20,6 +20,7 @@ import (
        "fmt"
        "io"
        "net/http"
+       "log"
        "sort"
        "strings"

@@ -28,6 +29,7 @@ import (

 // PutObjectStreaming using AWS streaming signature V4
 func (c Client) PutObjectStreaming(bucketName, objectName string, reader io.Reader) (n int64, err error) {
+       log.Printf("PutObjectStreaming called for %s\n", objectName)
        return c.PutObjectWithProgress(bucketName, objectName, reader, nil, nil)
 }

Trying to upload 2 (~10 byte) files (sequentially) normally results in:

2017/08/08 08:55:52 PutObjectStreaming called for wr_tests/a.file
2017/08/08 08:55:52 will create a new buffer of size 603979776
2017/08/08 08:55:53 success!
2017/08/08 08:55:53 PutObjectStreaming called for wr_tests/mvsubdir/file
2017/08/08 08:55:53 will create a new buffer of size 603979776
2017/08/08 08:55:54 success!

But sometimes, just running the same code over and over, I get:

2017/08/08 08:56:07 PutObjectStreaming called for wr_tests/a.file
2017/08/08 08:56:07 will create a new buffer of size 603979776
2017/08/08 08:56:08 success!
2017/08/08 08:56:08 will create a new buffer of size 603979776
signal: killed

I don't really understand where the kill signal is coming from, nor why a second buffer gets created before I see the log call for trying to stream the second file.

And, since I'm uploading files sequentially, why doesn't the first buffer get reused? Isn't that the point of having a pool? Why do I need to spend 1.2GB of memory to upload ~20bytes of data?

sb10 commented 7 years ago

Here's code that triggers the issue:

package main

import (
    "github.com/minio/minio-go"
    "io"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
)

func main() {
    tmpdir, err := ioutil.TempDir("", "upload_testing")
    if err != nil {
        log.Fatal(err)
    }
    defer os.RemoveAll(tmpdir)

    upFile1 := filepath.Join(tmpdir, "file1")
    upBytes := []byte("upfile\n")
    err = ioutil.WriteFile(upFile1, upBytes, 0644)
    if err != nil {
        log.Fatal(err)
    }
    upFile2 := filepath.Join(tmpdir, "file2")
    err = ioutil.WriteFile(upFile2, upBytes, 0644)
    if err != nil {
        log.Fatal(err)
    }

    client, err := minio.NewV2("cog.sanger.ac.uk", os.Getenv("AWS_ACCESS_KEY_ID"), os.Getenv("AWS_SECRET_ACCESS_KEY"), true)
    // client.TraceOn(nil)

    bucket := "sb10"

    upload(client, bucket, upFile1)
    upload(client, bucket, upFile2)
}

func upload(client *minio.Client, bucket, path string) {
    rpipe, wpipe := io.Pipe()

    go func() {
        // in my real use case I'm not opening a file, I am streamed chunks of
        // data; simulate that here
        f, err := os.Open(path)
        if err != nil {
            log.Fatal(err)
        }
        defer f.Close()

        b := make([]byte, 7)
        _, err = f.Read(b)
        if err != nil {
            log.Fatal(err)
        }
        wpipe.Write(b)
        wpipe.Close()
    }()

    log.Printf("upload of %s...\n", path)
    _, err := client.PutObjectStreaming(bucket, filepath.Base(path), rpipe)
    if err != nil {
        log.Fatal(err)
    }
    log.Println("upload succeeded")
    client.RemoveObject(bucket, filepath.Base(path))
}

If I run go run -tags netgo -race main.go then I mostly get:

2017/08/08 09:57:14 upload of /tmp/upload_testing054851088/file1...
2017/08/08 09:57:14 PutObjectStreaming called for file1
2017/08/08 09:57:14 will create a new buffer of size 603979776
2017/08/08 09:57:15 success!
2017/08/08 09:57:15 upload succeeded
2017/08/08 09:57:15 upload of /tmp/upload_testing054851088/file2...
2017/08/08 09:57:15 PutObjectStreaming called for file2
2017/08/08 09:57:15 will create a new buffer of size 603979776
signal: killed

But sometimes get:

2017/08/08 09:57:49 upload of /tmp/upload_testing028591414/file1...
2017/08/08 09:57:49 PutObjectStreaming called for file1
2017/08/08 09:57:49 will create a new buffer of size 603979776
2017/08/08 09:57:50 success!
2017/08/08 09:57:50 will create a new buffer of size 603979776
signal: killed

So it gets killed 100% of the time, but sometimes creates the second buffer early. Why is it creating a second buffer?

sb10 commented 7 years ago

(It seems to always work fine without -race, but I need my tests to pass cleanly with -race enabled.)

sb10 commented 7 years ago

It looks like the first and single call to bufPool.Get() in putObjectMultipartNoStream() can result in bufPool's New function being called twice. Sequential calls also result in the New function being called each time, despite the bufPool.Put() calls.

From the docs for sync.Pool:

Get may choose to ignore the pool and treat it as empty. Callers should not assume any relation between values passed to Put and the values returned by Get.

So maybe minio-go's behaviour isn't so surprising. As such I'd suggest that sync.Pool is not the ideal way to implement this.

It would be great if instead your pool guaranteed to return any previously created buffer if one is available, and didn't create more buffers than can fit in memory.

sb10 commented 7 years ago

I was wrong about a single bufPool.Get() resulting in bufPool's New function being called twice. A single putObjectMultipartNoStream() call results in multiple bufPool.Get() calls, the last of which at EOF currently doesn't result in a Put(), which is a bug.

Fixing this doesn't solve my issue though, since sync.Pool still creates unnecessary buffers. Please see my pull request for a pool implementation that works with the above code.

minio / minio-go

optimalPartInfo() default partSize results in running out of memory #730