mistralai / mistral-inference

Official inference library for Mistral models
https://mistral.ai/
Apache License 2.0
9.37k stars 817 forks source link

Building Mistral docker container results in OOM kill of the entire system #91

Open codevbus opened 7 months ago

codevbus commented 7 months ago

Following this instruction:

docker build deploy --build-arg MAX_JOBS=8

results in an OOM kill of my system. Completely dropped the X session and sent me back to my display manager login.

Dec 17 23:51:04 ws01 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_>
Dec 17 23:51:04 ws01 systemd[1]: session-2.scope: A process of this unit has been killed by the OOM kill>
Dec 17 23:51:05 ws01 kernel: cicc invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|>
Dec 17 23:51:05 ws01 kernel: CPU: 21 PID: 179121 Comm: cicc Tainted: P           OE      6.6.7-arch1-1 #>
Dec 17 23:51:05 ws01 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING E>
Dec 17 23:51:05 ws01 kernel: Call Trace:
Dec 17 23:51:05 ws01 kernel:  <TASK>
Dec 17 23:51:05 ws01 kernel:  dump_stack_lvl+0x47/0x60
Dec 17 23:51:05 ws01 kernel:  dump_header+0x4a/0x240
Dec 17 23:51:05 ws01 kernel:  oom_kill_process+0xf9/0x190
Dec 17 23:51:05 ws01 kernel:  out_of_memory+0x246/0x590
Dec 17 23:51:05 ws01 kernel:  __alloc_pages_slowpath.constprop.0+0xa5f/0xd90
Dec 17 23:51:05 ws01 kernel:  __alloc_pages+0x32d/0x350
Dec 17 23:51:05 ws01 kernel:  __folio_alloc+0x1b/0x50
Dec 17 23:51:05 ws01 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 17 23:51:05 ws01 kernel:  vma_alloc_folio+0xa0/0x380
Dec 17 23:51:05 ws01 kernel:  do_anonymous_page+0x71/0x3b0
Dec 17 23:51:05 ws01 kernel:  __handle_mm_fault+0xbdd/0xd90
Dec 17 23:51:05 ws01 kernel:  handle_mm_fault+0x17f/0x360
Dec 17 23:51:05 ws01 kernel:  do_user_addr_fault+0x15b/0x660
Dec 17 23:51:05 ws01 kernel:  exc_page_fault+0x7f/0x180
Dec 17 23:51:05 ws01 kernel:  asm_exc_page_fault+0x26/0x30
Dec 17 23:51:05 ws01 kernel: RIP: 0033:0x7f3710bd524b
Dec 17 23:51:05 ws01 kernel: Code: 5b 6a 17 00 31 d2 48 8d 34 29 48 39 fb 48 89 73 60 0f 95 c2 48 29 e8 >
Dec 17 23:51:05 ws01 kernel: RSP: 002b:00007ffd7d8b5790 EFLAGS: 00010206
Dec 17 23:51:05 ws01 kernel: RAX: 00000000000208d1 RBX: 00007f3710d4bc80 RCX: 000000002cf1f720
Dec 17 23:51:05 ws01 kernel: RDX: 0000000000010011 RSI: 000000002cf2f730 RDI: 00007f3710d4bc80
Dec 17 23:51:05 ws01 kernel: RBP: 0000000000010010 R08: 0000000000000000 R09: 000000000000007e
Dec 17 23:51:05 ws01 kernel: R10: 000000002cf2d000 R11: 0000000000000206 R12: 000000000000d8e0
Dec 17 23:51:05 ws01 kernel: R13: 000000002cf1f720 R14: 0000000000001000 R15: 0000000000010030
Dec 17 23:51:05 ws01 kernel:  </TASK>
Dec 17 23:51:05 ws01 kernel: Mem-Info:
Dec 17 23:51:05 ws01 kernel: active_anon:1140278 inactive_anon:14868754 isolated_anon:0
                              active_file:1863 inactive_file:2917 isolated_file:0
                              unevictable:4058 dirty:2244 writeback:0
                              slab_reclaimable:42620 slab_unreclaimable:81781
                              mapped:141523 shmem:89532 pagetables:54344
                              sec_pagetables:0 bounce:0
                              kernel_misc_reclaimable:0
                              free:88166 free_pcp:103 free_cma:0
Dec 17 23:51:05 ws01 kernel: Node 0 active_anon:4561112kB inactive_anon:59475016kB active_file:7960kB in>
Dec 17 23:51:05 ws01 kernel: Node 0 DMA free:11264kB boost:0kB min:12kB low:24kB high:36kB reserved_high>
Dec 17 23:51:05 ws01 kernel: lowmem_reserve[]: 0 3166 64184 64184 64184
Dec 17 23:51:05 ws01 kernel: Node 0 DMA32 free:247384kB boost:0kB min:3332kB low:6572kB high:9812kB rese>
Dec 17 23:51:05 ws01 kernel: lowmem_reserve[]: 0 0 61018 61018 61018
Dec 17 23:51:05 ws01 kernel: Node 0 Normal free:94016kB boost:122880kB min:187112kB low:249592kB high:31>