grailbio / base

A collection of Go utility packages used by GRAIL's tools
Apache License 2.0
86 stars 24 forks source link

Make S3 UploadPartSize a file.Opt #36

Closed bprosnitz closed 2 years ago

bprosnitz commented 2 years ago

A process using file.Create("s3://...") with ~100KB files has been seen to have low throughput. In profiles, it appears that there is significant time spent in GC with the sync.Pool created in newUploader accounting for 90+% of heap usage.

I looked at a few ideas to avoid this:

  1. Make it possible to reuse sync.Pool across file creations.
  2. Make it possible to change the chunk size the pool uses.
  3. Start with a small buffer initially and if writes grow past it then copy to the sync.Pool. If the writer closes before the data grows past the small buffer, upload the small buffer contents.

I went with approach #2 in this PR because it seemed simplest.