minio / warp

S3 benchmarking tool
GNU Affero General Public License v3.0
579 stars 112 forks source link

crash with high concurrency in warp put #301

Open harshavardhana opened 9 months ago

harshavardhana commented 9 months ago
warp put --tls --insecure --host 10.10.100.61:9000 --access-key minio --secret-key minio123 --autoterm --concurrent 168
panic: runtime error: slice bounds out of range [24560:16400]▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃   6.67%

goroutine 953 [running]:
github.com/secure-io/sio-go.(*EncReader).Read(0xc00033ebb0, {0xc002394000?, 0xc0072929a8?, 0x41d4f6?})
        github.com/secure-io/sio-go@v0.3.1/reader.go:57 +0x1ea
io.ReadAtLeast({0xe309e0, 0xc00033ebb0}, {0xc002394000, 0x2000, 0x2000}, 0x2000)
        io/io.go:335 +0x90
io.ReadFull(...)
        io/io.go:354
github.com/minio/warp/pkg/generator.(*scrambler).Read(0xc000640090, {0xc002394000?, 0x452ae9?, 0x2000?})
        github.com/minio/warp/pkg/generator/scambler.go:116 +0x6c
github.com/minio/minio-go/v7.(*hookReader).Read(0xc002c2aa80, {0xc002394000, 0x6?, 0x2000})
        github.com/minio/minio-go/v7@v7.0.66/hook-reader.go:76 +0xbe
io.discard.ReadFrom({}, {0x7f6a69f55180, 0xc0069cedb0})
        io/io.go:658 +0x6d
io.copyBuffer({0xe2f5e0, 0x13ba060}, {0x7f6a69f55180, 0xc0069cedb0}, {0x0, 0x0, 0x0})
        io/io.go:416 +0x147
io.Copy(...)
        io/io.go:389
net/http.(*transferWriter).doBodyCopy(0xc006054a00, {0xe2f5e0?, 0x13ba060?}, {0x7f6a69f55180?, 0xc0069cedb0?})
        net/http/transfer.go:412 +0x48
net/http.(*transferWriter).writeBody(0xc006054a00, {0xe2fa20, 0xc003212140})
        net/http/transfer.go:375 +0x408
net/http.(*Request).write(0xc006ffdc00, {0xe2fa20, 0xc003212140}, 0x0, 0x0, 0x0)
        net/http/request.go:738 +0xbad
net/http.(*persistConn).writeLoop(0xc0039c8a20)
        net/http/transport.go:2424 +0x18f
created by net/http.(*Transport).dialConn in goroutine 2172
        net/http/transport.go:1777 +0x16f1
klauspost commented 9 months ago

Maybe @aead can help a bit since it is sio? Could also be some concurrent access without looking at the code.

harshavardhana commented 7 months ago

@aead ^^

klauspost commented 7 months ago

I think this could actually be related to the issues we are having with on multipart uploads.

harshavardhana commented 7 months ago

I think this could actually be related to the issues we are having with on multipart uploads.

Which one @klauspost ?

klauspost commented 7 months ago

@harshavardhana The one that forced us to turn off checksums on multipart replication or tiering - forget which.

harshavardhana commented 7 months ago

@harshavardhana The one that forced us to turn off checksums on multipart replication or tiering - forget which.

We didn't turn off checksums for that we turned off doing sha256 and md5sum which are expensive.

We still enable crc checksums.

akshay8043 commented 1 month ago

Sorry I am jumping in between, possibly i feel I am in same boat.

I am running warp mixed to fill a bucket of an NVME object storage of a less than 150KB object size with 500-600 million objects with 500 concurrent using 2 clients,

i think client are automatically killing and my warp scripts stops.

warp put doesn't have option to upload number of objects which is why i am using warp mixed and keeping all other distribution zero and keeping put-distrib to 100.

  1. Feature request to use warp put with number of objects option / parameter
  2. what could be the issue warp client shows killed.
klauspost commented 1 month ago

@akshay8043 You are just running out of memory, and that is not related to this. Use --stress, and requests will no longer be logged. Use warp get if you want to upload a specific number of objects.

romayalon commented 1 month ago

Hey @klauspost We also experience a crash of warp client when running high concurrency, --stress did not help, Are there any recommendations for debugging this issue? This is the command we run -

warp versioned --host="$host_address" --access-key="$access_key" --secret-key="$secret_key" --obj.size=1k --duration=1h --stress --objects=10000 --concurrent=100 --bucket="bucket1" --insecure –tls
klauspost commented 1 month ago

@romayalon Provide a trace from the crash. Without that there is nothing to go on.

romayalon commented 3 weeks ago

@klauspost Is there a way to get a trace if the server is not minio? we run on NooBaa as server, this is all I got from the person who ran it - warp dies 351316 Killed
warp versioned --host={10 hosts addresses} --access-key="$access_key" --secret-key="$secret_key" --obj.size=1k --stress --duration=8h --objects=10000 --concurrent=1000 --bucket="bucket5004" --insecure --tls

klauspost commented 3 weeks ago

@romayalon Sounds like you are getting OOM killed.

romayalon commented 3 weeks ago

@klauspost I thought so too but we usually see OOMKilled 137 error, Is there a way to get warp logs?

klauspost commented 3 weeks ago

@romayalon Either way it is being killed externally.

romayalon commented 3 weeks ago

Updating for the community that we found a proof that warp was OOMkilled in var/log/messages - kernel: Out of memory: Killed process <pid> (warp)