minio / minio

The Object Store for AI Data Infrastructure
https://min.io/download
GNU Affero General Public License v3.0
45.38k stars 5.3k forks source link

MInio Service restarting constantly after upgrade, and total used storage and number of objects also reduced #14726

Closed saurabhp0119 closed 2 years ago

saurabhp0119 commented 2 years ago

NOTE

MInio services are constantly restarting after upgrading to the latest version, cant even do mc admin info

Errors from mc log logs: mc admin info minio01 mc: Unable to get service status: json: cannot unmarshal number into Go struct field DiskMetrics.servers.drives.metrics.apiLatencies of type string.

also mc console logs are not working

We ran the minio server form commanline, and when it got killed, the log is as follows: error.log

harshavardhana commented 2 years ago

You need to upgrade mc as well @saurabhp0119

harshavardhana commented 2 years ago

Also the error log doesn't look like entire crash, please capture entire crash.

saurabhp0119 commented 2 years ago

here is the full log file form the server: minio_temp_error.log On another note, we are using sidekick to interact with minio, version 0.5.5, does that also need to be upgraded? Can old sidekick version cause minio server to restart?

harshavardhana commented 2 years ago

here is the full log file form the server: minio_temp_error.log

There is no crash here @saurabhp0119 please collect similar logs from all 20 servers and attach it here.

harshavardhana commented 2 years ago

Also you do know that you need to comply with AGPLv3 license here @saurabhp0119 please read https://github.com/minio/minio/blob/master/COMPLIANCE.md

saurabhp0119 commented 2 years ago

Yes, will go through the liscence. ALso here is the complete crash log form one of the server error.log .

harshavardhana commented 2 years ago

You are running out of memory

fatal error: runtime: out of memory

runtime stack:
runtime.throw({0x26338eb, 0x400000})
        runtime/panic.go:1198 +0x71
runtime.sysMap(0xdd92000000, 0x42b020, 0xc028099e90)
        runtime/mem_linux.go:169 +0x96
runtime.(*mheap).grow(0x5ee3700, 0x9)
        runtime/mheap.go:1393 +0x225
runtime.(*mheap).allocSpan(0x5ee3700, 0x9, 0x0, 0x79)
        runtime/mheap.go:1179 +0x165
runtime.(*mheap).alloc.func1()
        runtime/mheap.go:913 +0x69
runtime.systemstack()
        runtime/asm_amd64.s:383 +0x49

goroutine 25716857 [running]:
runtime.systemstack_switch()
        runtime/asm_amd64.s:350 fp=0xc258854c98 sp=0xc258854c90 pc=0x4696a0
runtime.(*mheap).alloc(0x1f1ea78, 0x476112, 0xf3, 0x1)
        runtime/mheap.go:907 +0x73 fp=0xc258854ce8 sp=0xc258854c98 pc=0x427353
runtime.(*mcentral).grow(0x1d815d3)
        runtime/mcentral.go:241 +0x65 fp=0xc258854d30 sp=0xc258854ce8 pc=0x417a65
runtime.(*mcentral).cacheSpan(0x5ef9408)
        runtime/mcentral.go:161 +0x69e fp=0xc258854da8 sp=0xc258854d30 pc=0x41789e
runtime.(*mcache).refill(0x7f038a3a9688, 0x79)
        runtime/mcache.go:162 +0xaf fp=0xc258854df8 sp=0xc258854da8 pc=0x416acf
runtime.(*mcache).nextFree(0x7f038a3a9688, 0x79)
        runtime/malloc.go:886 +0x85 fp=0xc258854e40 sp=0xc258854df8 pc=0x40c9a5
runtime.mallocgc(0x400a, 0x2197020, 0x1)
        runtime/malloc.go:1077 +0x4e8 fp=0xc258854ec0 sp=0xc258854e40 pc=0x40d028
runtime.makeslice(0x259a2e0, 0xdd91e6ef00, 0x203764)
        runtime/slice.go:98 +0x52 fp=0xc258854ee8 sp=0xc258854ec0 pc=0x44e972
github.com/klauspost/compress/s2.NewReader({0x0, 0x0}, {0xc258854f58, 0x1, 0xb})
        github.com/klauspost/compress@v1.14.4/s2/decode.go:99 +0x28e fp=0xc258854f20 sp=0xc258854ee8 pc=0xf802ce
github.com/minio/minio/cmd.glob..func3()
        github.com/minio/minio/cmd/metacache-stream.go:238 +0x58 fp=0xc258854f70 sp=0xc258854f20 pc=0x1f1ea78
sync.(*Pool).Get(0x5e6c860)
        sync/pool.go:148 +0xb2 fp=0xc258854fa8 sp=0xc258854f70 pc=0x476112
github.com/minio/minio/cmd.newMetacacheReader({0x49fa420, 0xdd8e3f7528})
harshavardhana commented 2 years ago

Most probably you have something else running on the system that is using up the memory and not allowing MinIO to use sufficient memory for its internal routines.

Your namespace also looks quite huge and deeply nested - looks like expected behavior here just provide enough RAM.

You should engage with us at https://min.io/pricing to get an architecture overview here.

saurabhp0119 commented 2 years ago

But it is a 128 gb ram machine, and there is nothing else running on that. Before version update, the same machines were supporting the same data.

harshavardhana commented 2 years ago

But it is a 128 gb ram machine, and there is nothing else running on that. Before version update, the same machines were supporting the same data.

It doesn't matter the memory is not enough right now. Something has changed this requires deeper investigation - we won't be able to do that unfortunately without subscription.

saurabhp0119 commented 2 years ago

But it is a 128 gb ram machine, and there is nothing else running on that. Before version update, the same machines were supporting the same data.

It doesn't matter the memory is not enough right now. Something has changed this requires deeper investigation - we won't be able to do that unfortunately without subscription.

okay, then I will try with higher Ram, thanks for your reply.

saurabhp0119 commented 2 years ago

I have another question..one fo the disks on one servers shows unusually low storage compared to others. Like this: /dev/sdj 9.1T 7.5T 1.7T 82% /mnt/drive10 /dev/sda 9.1T 7.6T 1.6T 84% /mnt/drive1 /dev/sdh 9.1T 7.6T 1.6T 84% /mnt/drive8 /dev/sdf 9.1T 7.4T 1.8T 81% /mnt/drive6 /dev/sde 9.1T 7.4T 1.8T 82% /mnt/drive5 /dev/sdg 9.1T 7.4T 1.8T 82% /mnt/drive7 /dev/sdc 9.1T 7.6T 1.6T 83% /mnt/drive3 /dev/sdb 9.1T 7.3T 1.9T 80% /mnt/drive2 /dev/sdi 9.1T 7.7T 1.5T 85% /mnt/drive9 /dev/sdd 9.1T 3.1T 6.1T 34% /mnt/drive4

Can you let me know why this would be, this is the only disk which is so low, how can i resolve this? Do i need to create a new ticket for this?

harshavardhana commented 2 years ago

I have another question..one fo the disks on one servers shows unusually low storage compared to others. Like this: /dev/sdj 9.1T 7.5T 1.7T 82% /mnt/drive10 /dev/sda 9.1T 7.6T 1.6T 84% /mnt/drive1 /dev/sdh 9.1T 7.6T 1.6T 84% /mnt/drive8 /dev/sdf 9.1T 7.4T 1.8T 81% /mnt/drive6 /dev/sde 9.1T 7.4T 1.8T 82% /mnt/drive5 /dev/sdg 9.1T 7.4T 1.8T 82% /mnt/drive7 /dev/sdc 9.1T 7.6T 1.6T 83% /mnt/drive3 /dev/sdb 9.1T 7.3T 1.9T 80% /mnt/drive2 /dev/sdi 9.1T 7.7T 1.5T 85% /mnt/drive9 /dev/sdd 9.1T 3.1T 6.1T 34% /mnt/drive4

Can you let me know why this would be, this is the only disk which is so low, how can i resolve this? Do i need to create a new ticket for this?

You may have some temporary objects left over, not sure why it might be.