quantum / esos

An open source, high performance, block-level storage platform.
http://www.esos-project.com/
Other
278 stars 57 forks source link

Not Booting on Dell Poweredge R7515 #265

Closed 4censord closed 3 years ago

4censord commented 3 years ago

I'm trying to use esos on a new Dell Poweredge R7515.

I have tried different versions, going down to 1.3.9. I always get up to grub, but nothing after that. Also nothing with the debug entry.

The usb-stick boots fine in a vm, and i have also tried some different sticks. I have tried in both Bios an UEFI mode

As it doesnt boot, it doesnt generate any logs. Also nothing on the console

For the hardware: its an Dell Poweredge R7515 AMD EPYC 7232P and 32Gb RAM PERC H730P RAID Controller and 6x 8TB 12Gb SAS

msmith626 commented 3 years ago

I recall Dell BIOS/setup used to have an option to control the USB "hard drive" emulation (automatic and something else). That still exist in these newer Dell machines? Does your USB flash drive appear as removable media or a fixed disk? Some nicer Sandisk USB flash drives have this bit flipped so they appear as a fixed disk rather than removable media and might behave differently.

Another idea is to simply write out the ESOS image to some other local disk in the system (eg, SAS) and try booting from that (not USB flash).

4censord commented 3 years ago

Hi, thanks for the tip with the "USB 'hard drive' emulation" that gave me the right nudge into the right direction.

Sadly now its kernel panicing directly at the start.

` [ 18.462200] ------------[ cut here ]------------ [ 18.517500] kernel BUG at arch/x86/kernel/apic/apic.c:1629! [ 18.584185] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 18.653013] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.69-esos.debug #1 [ 18.735276] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.5.3 07/27/2020 [ 18.825869] RIP: 0010:setup_local_APIC+0xe3/0x366 [ 18.882126] Code: e8 bc ad 2b 00 4c 8b 25 b1 07 8a 02 49 8d bc 24 a0 00 00 00 e8 a8 ad 2b 00 49 8b 84 24 a0 00 00 00 e8 87 8c b9 01 85 c0 75 02 <0f> 0b 48 c7 c7 40 89 90 83 4c 8d 7d d8 e8 85 ad 2b 00 4c 8b 25 7a [ 19.106870] RSP: 0000:ffffffff83a07d40 EFLAGS: 00010246 [ 19.169375] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810708a4 [ 19.254759] RDX: fffffbfff0902d2c RSI: 0000000000000008 RDI: ffffffff84816958 [ 19.340145] RBP: ffffffff83a07e40 R08: fffffbfff0902d2c R09: 00000000000c0003 [ 19.425528] R10: fffffbfff0902d2c R11: 0000000000000000 R12: ffffffff83908960 [ 19.510913] R13: 1ffffffff0740fcf R14: ffffffff82e009c0 R15: ffffffff82e009c0 [ 19.596300] FS: 0000000000000000(0000) GS:ffff888723800000(0000) knlGS:0000000000000000 [ 19.693123] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.761868] CR2: ffff88887f1ff000 CR3: 0000000003a2a000 CR4: 00000000000406b0 [ 19.847253] Call Trace: [ 19.876481] ? vprintk_func+0xc0/0xca [ 19.920262] ? lapic_timer_set_oneshot+0x36/0x36 [ 19.975486] ? printk+0xb2/0xe3 [ 20.013030] ? rcu_read_unlock_sched_notrace+0x15/0x15 [ 20.074498] ? __bitmap_weight+0x5e/0x6d [ 20.121402] apic_intr_mode_init+0xe3/0xff [ 20.170385] x86_late_time_init+0x48/0x4f [ 20.218330] start_kernel+0x573/0x652 [ 20.262113] ? mem_encrypt_init+0xb/0xb [ 20.307978] x86_64_start_reservations+0x29/0x2b [ 20.363201] x86_64_start_kernel+0x77/0x7b [ 20.412187] secondary_startup_64+0xa4/0xb0 [ 20.462209] Modules linked in: [ 20.498736] ---[ end trace afafcf4e5a6774dd ]--- [ 20.554053] RIP: 0010:setup_local_APIC+0xe3/0x366 [ 20.610317] Code: e8 bc ad 2b 00 4c 8b 25 b1 07 8a 02 49 8d bc 24 a0 00 00 00 e8 a8 ad 2b 00 49 8b 84 24 a0 00 00 00 e8 87 8c b9 01 85 c0 75 02 <0f> 0b 48 c7 c7 40 89 90 83 4c 8d 7d d8 e8 85 ad 2b 00 4c 8b 25 7a [ 20.835066] RSP: 0000:ffffffff83a07d40 EFLAGS: 00010246 [ 20.897567] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810708a4 [ 20.982951] RDX: fffffbfff0902d2c RSI: 0000000000000008 RDI: ffffffff84816958 [ 21.068337] RBP: ffffffff83a07e40 R08: fffffbfff0902d2c R09: 00000000000c0003 [ 21.153722] R10: fffffbfff0902d2c R11: 0000000000000000 R12: ffffffff83908960 [ 21.239106] R13: 1ffffffff0740fcf R14: ffffffff82e009c0 R15: ffffffff82e009c0 [ 21.324491] FS: 0000000000000000(0000) GS:ffff888723800000(0000) knlGS:0000000000000000 [ 21.421315] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.490059] CR2: ffff88887f1ff000 CR3: 0000000003a2a000 CR4: 00000000000406b0 [ 21.575446] Kernel panic - not syncing: Attempted to kill the idle task! [ 21.655645] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

` The only kerneloption i have changed is setting the output to the serial "console=ttyS0" I have been able to replicate with both 3.0.0 as well as the new 3.0.1 version. i haven't yet had the time to try any previos verions.

log_with_loglevel=7.txt

msmith626 commented 3 years ago

Seems someone else asked about the same BUG_ON() that you're hitting (they have Dell hardware too): https://lkml.org/lkml/2020/2/21/1501

Looks like you're hitting this: /*

Are you passing any different kernel parameters than what ESOS comes with as defaults? Booting the -esos.prod kernel exhibits the same behavior (you have -esos.debug booted in this trace output).

--Marc

On Fri, Nov 6, 2020 at 5:28 AM 4censord notifications@github.com wrote:

Hi, thanks for the tip with the "USB 'hard drive' emulation" that gave me the right nudge into the right direction.

Sadly now its kernel panicing directly at the start.

` [ 18.462200] ------------[ cut here ]------------ [ 18.517500] kernel BUG at arch/x86/kernel/apic/apic.c:1629! [ 18.584185] invalid opcode: 0000 [#1 https://github.com/quantum/esos/issues/1] PREEMPT SMP KASAN NOPTI [ 18.653013] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.69-esos.debug

1 https://github.com/quantum/esos/issues/1

[ 18.735276] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.5.3 07/27/2020 [ 18.825869] RIP: 0010:setup_local_APIC+0xe3/0x366 [ 18.882126] Code: e8 bc ad 2b 00 4c 8b 25 b1 07 8a 02 49 8d bc 24 a0 00 00 00 e8 a8 ad 2b 00 49 8b 84 24 a0 00 00 00 e8 87 8c b9 01 85 c0 75 02

<0f> 0b 48 c7 c7 40 89 90 83 4c 8d 7d d8 e8 85 ad 2b 00 4c 8b 25 7a [ 19.106870] RSP: 0000:ffffffff83a07d40 EFLAGS: 00010246 [ 19.169375] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810708a4 [ 19.254759] RDX: fffffbfff0902d2c RSI: 0000000000000008 RDI: ffffffff84816958 [ 19.340145] RBP: ffffffff83a07e40 R08: fffffbfff0902d2c R09: 00000000000c0003 [ 19.425528] R10: fffffbfff0902d2c R11: 0000000000000000 R12: ffffffff83908960 [ 19.510913] R13: 1ffffffff0740fcf R14: ffffffff82e009c0 R15: ffffffff82e009c0 [ 19.596300] FS: 0000000000000000(0000) GS:ffff888723800000(0000) knlGS:0000000000000000 [ 19.693123] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.761868] CR2: ffff88887f1ff000 CR3: 0000000003a2a000 CR4: 00000000000406b0 [ 19.847253] Call Trace: [ 19.876481] ? vprintk_func+0xc0/0xca [ 19.920262] ? lapic_timer_set_oneshot+0x36/0x36 [ 19.975486] ? printk+0xb2/0xe3 [ 20.013030] ? rcu_read_unlock_sched_notrace+0x15/0x15 [ 20.074498] ? __bitmap_weight+0x5e/0x6d [ 20.121402] apic_intr_mode_init+0xe3/0xff [ 20.170385] x86_late_time_init+0x48/0x4f [ 20.218330] start_kernel+0x573/0x652 [ 20.262113] ? mem_encrypt_init+0xb/0xb [ 20.307978] x86_64_start_reservations+0x29/0x2b [ 20.363201] x86_64_start_kernel+0x77/0x7b [ 20.412187] secondary_startup_64+0xa4/0xb0 [ 20.462209] Modules linked in: [ 20.498736] ---[ end trace afafcf4e5a6774dd ]--- [ 20.554053] RIP: 0010:setup_local_APIC+0xe3/0x366 [ 20.610317] Code: e8 bc ad 2b 00 4c 8b 25 b1 07 8a 02 49 8d bc 24 a0 00 00 00 e8 a8 ad 2b 00 49 8b 84 24 a0 00 00 00 e8 87 8c b9 01 85 c0 75 02 <0f> 0b 48 c7 c7 40 89 90 83 4c 8d 7d d8 e8 85 ad 2b 00 4c 8b 25 7a [ 20.835066] RSP: 0000:ffffffff83a07d40 EFLAGS: 00010246 [ 20.897567] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810708a4 [ 20.982951] RDX: fffffbfff0902d2c RSI: 0000000000000008 RDI: ffffffff84816958 [ 21.068337] RBP: ffffffff83a07e40 R08: fffffbfff0902d2c R09: 00000000000c0003 [ 21.153722] R10: fffffbfff0902d2c R11: 0000000000000000 R12: ffffffff83908960 [ 21.239106] R13: 1ffffffff0740fcf R14: ffffffff82e009c0 R15: ffffffff82e009c0 [ 21.324491] FS: 0000000000000000(0000) GS:ffff888723800000(0000) knlGS:0000000000000000 [ 21.421315] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.490059] CR2: ffff88887f1ff000 CR3: 0000000003a2a000 CR4: 00000000000406b0 [ 21.575446] Kernel panic - not syncing: Attempted to kill the idle task! [ 21.655645] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- ` The only kerneloption i have changed is setting the output to the serial "console=ttyS0" I have been able to replicate with both 3.0.0 as well as the new 3.0.1 version. i haven't yet hab the time to try any previos verions. log_with_loglevel=7.txt — You are receiving this because you commented. Reply to this email directly, view it on GitHub , or unsubscribe .
4censord commented 3 years ago

It's working now, i will mark this closed. I disabled "X2APIC" in the bios and now it boots first try.

Are you passing any different kernel parameters than what ESOS comes with as defaults?

i had changed the "console=tty0" parameter to "console=ttyS0" as it wasn't outputting anything on VGA or the idrac remote console. This seems to have been fixed with disabling "X2APIC" as well. For the logfile I added "loglevel=7" as well.

Booting the -esos.prod kernel exhibits the same behavior (you have -esos.debug booted in this trace output).

Yes.

Thanks for your help.