twizzler-operating-system / twizzler

The Twizzler Operating System
BSD 3-Clause "New" or "Revised" License
66 stars 14 forks source link

Qemu crashes soon after Limine boots on a fresh Ubuntu 20.04 install #75

Closed NateHerman closed 2 years ago

NateHerman commented 2 years ago

Following directions in doc/src/BUILD.md, everything went fine until: cargo start-qemu

then I get:

BdsDxe: failed to load Boot0001 "UEFI QEMU DVD-ROM QM00005 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x2,0xFFFF,0x0): Not Found BdsDxe: loading Boot0002 "UEFI QEMU HARDDISK QM00001 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x0,0xFFFF,0x0) BdsDxe: starting Boot0002 "UEFI QEMU HARDDISK QM00001 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x0,0xFFFF,0x0)

right as Limine's interface comes up in qemu . . . Limine then boots automatically and the following:

Screenshot from 2022-06-08 11-06-36

is all that flashes quickly before Qemu crashes.

the same happens with: cargo start-qemu --profile release

I'm on a fresh install of Mint Linux 20 Cinnamon (which is based on Ubuntu 20.04). It's not a virtual machine install; it's on a Gateway ZX6900 machine.

I don't see how to put a "Bug" Label on this Issue. I'm guessing I don't have permissions to do it? Maybe someone who does can label this "Bug".

dbittman commented 2 years ago

When Qemu crashes, does qemu itself crash or just exit? Does it print anything, or is there anything in the dmesg log? Also, what version of qemu are you using?

I'll try to reproduce. Thanks.

NateHerman commented 2 years ago

thanks for getting back to me

qemu-system-x86_64 --version QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.21) Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

I did dmesg | grep -i qemu dmesg | grep -i cargo dmesg | grep -i kvm

the only thing I got anything for was kvm:

[ 11.231259] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround

which could be unrelated

after cargo start-qemu ended I did a:

echo $? and it came up 0

is there anything else you need me to do which will be helpful?

circutrider21 commented 2 years ago

Add -no-reboot and -no-shutdown to QEMU's command line. These flags pause QEMU on a triple fault. You can then run info registers in QEMU's monitor to dump the faulting instruction.

NateHerman commented 2 years ago

@circutrider21

I added -no-reboot and -no-shutdown to QEMU's command line, ran cargo start-qemu, and it looks like it gets a step further to where the qemu ui has QEMU[Paused] at the top of it.

Then I tried opening a new terminal in the same directory and then typed info registers and get: info: No menu item 'registers' in node '(dir)Top'

The same thing happens when I close out of qemu ui and type info registers in the original terminal: info: No menu item 'registers' in node '(dir)Top'

I then did some research in google searching "info registers in QEMU's monitor" and found this page and tried both suggestions:

1) added -monitor stdio to qemu's command line and got errors: QEMU 4.2.1 monitor - type 'help' for more information (qemu) qemu-system-x86_64: -serial mon:stdio: cannot use stdio by multiple character devices qemu-system-x86_64: -serial mon:stdio: could not connect serial device to character backend 'mon:stdio' Error: qemu return with error

2) added -s to qemu's command line and got errors: qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.8000000AH:EDX.tsc-scale [bit 4] qemu-system-x86_64: -s: Failed to find an available port: Address already in use Error: qemu return with error

I looked up all the errors and didn't get anything I could decipher into something useful. Keep in mind though that I'm relatively new to linux, rust, and programming.

Do you have any thoughts what to try next? Thanks.

BTW, on your github you said "I’m looking to collaborate on anything System Programming" so I thought I'd throw it out there why I wanna get twizzler up and running and see if it sounds interesting at all to you and if you'd be inclined to tinker with what I want to try out. Basically, I wanna build an OS based on twizzler that can scale to world-scale and sorta just works as a single big computer and also gets rid of the command line in favor of a 3d OS interface. I both really need help understanding and getting good at systems programming as well as the first things to implement on twizzler would be a rholang compiler / interpreter (rholang's fascinating: a language that has concurrency / distributedness baked in from the bottom up as well as other really interesting stuff that sorta qualifies it as the only language up to the task I'm envisioning) and a 3d visual rholang editor as well as 3d OS navigation / visualization based on Projective Geometric Algebra. I have an idea as to what I want the 3d environments should be like. Anyway, lemme know if you want more info. I'd prefer to get on a video chat though 'cause I'm not the best typing stuff out. Plus it'd be easier to pick your brain / get your opinion on things that way. Anyway, not sure if this was the place to ask it (b/c I'm somewhat new to using github) but there it is.

Cool, and thanks for lending the hand furthering this bug along.

Sincerely, Nate

circutrider21 commented 2 years ago

@NateHerman you did good throughly describing the issue, and judging from your feedback, it seems to me that you are having trouble accessing the QEMU console (in order to type info registers). To open QEMU's console, simply click on view in the toolbar, then click on monitor. You can also pass -monitor stdio to QEMU, and have the monitor open on the same terminal QEMU was launched (which I'm assuming you've already discovered). The QEMU console itself is not a terminal, which is why typing anything other than info registers fails. To answer your systems programming question, GitHub isn't the best of places to find help. Instead, I would like to recommend you to the excellent OSDEV discord, which is where I (amongst other experienced programmers) hang out to help people starting out. The owner of this server also happens to be the creator of limine. Also in this server is the creator of rust bindings to stivale2. You can join this server here

dbittman commented 2 years ago

@NateHerman can you also provide the CPU model of the machine you're using to test?

NateHerman commented 2 years ago

@circutrider21 I figured out why I was getting the errors I pasted above. I commented out some redundant and un-needed command line options in rust code that generates cargo start-qemu script. And went with the options you suggested. Lo and behold:

(qemu) info registers RAX=00000000000100a0 RBX=0000000000000000 RCX=0000000000777fff RDX=0000000000000000 RSI=0000000000010000 RDI=00000000000100a0 RBP=ffffffff80113e70 RSP=ffffffff80113e68 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 RIP=ffffffff8043fec9 RFL=00010092 [--S-A--] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0030 0000000000000000 00000000 00009300 DPL=0 DS [-WA] CS =0028 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA] SS =0030 0000000000000000 00000000 00009300 DPL=0 DS [-WA] DS =0030 0000000000000000 00000000 00009300 DPL=0 DS [-WA] FS =0030 0000000000000000 00000000 00009300 DPL=0 DS [-WA] GS =0030 0000000000000000 00000000 00009300 DPL=0 DS [-WA] LDT=0000 0000000000000000 ffffffff 00c00000 TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 000000003fefc000 00000037 IDT= 0000000000000000 00000000 CR0=80000011 CR2=0000000000000000 CR3=000000003fe50000 CR4=00000020 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d00 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 XMM08=00000000000000000000000000000000 XMM09=00000000000000000000000000000000 XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000 XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000 XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000 (qemu) qemu-system-x86_64: terminating on signal 2

As a side note, not that it really matters b/c I got QEMU's console working with the second command line method you described, but in the first method, where you described clicking on view in the toolbar, then clicking on "monitor", just to let you know, there wasn't any "monitor" menu item to select. Not sure why. But I got it working the other way, so, good enough.

Hope that helps. As for the Discord server. I'm definitely gonna go there and see if can get help starting out. I'm somewhat shy, but I gotta start being social with learning programming b/c I keep trying all these different rust tutorials and books and live-coding and I dunno, just somehow the info isn't getting fed in the right way such that I keep going with it, so, maybe to feel like I'm talking and learning somehow with actual people will help. I'm newly in with another group of people called Kiloby Inquiries and I'm finding the social aspect is making all the difference there and so maybe the same will happen on the Discord server you gave me a link to. Anyway, thanks for all the help sofar.

Oh, and let me know what I need to do next if info registers gave you a clue as to what's going on. Cool. Off to answer dbittman's question.

Sincerely, Nate

NateHerman commented 2 years ago

@dbittman

can you also provide the CPU model of the machine you're using to test?

I did lscpu and it gave me:

Model name: Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz

for the model name. Is that what you needed? Thanks for the help.

Sincerely, Nate

circutrider21 commented 2 years ago

@NateHerman if this is a debug build, running the following command will get you the exact line of the triple fault.

addr2line -e <path to kernel> 0xffffffff8043fec9

I doubt you will get output tho, since the faulting address (which I took from your register dump) is 3MB higher than the kernel base address.

On a related note, @dbittman I recommend you use a linker script instead of just specifying the base address to rustc. This bit only makes the kernel smaller, but it opens up other possibilities like having the stack automatically allocated for you from the bss section, and having proper memory segment permissions applied by the bootloader.

dbittman commented 2 years ago

On a related note, @dbittman I recommend you use a linker script instead of just specifying the base address to rustc. This bit only makes the kernel smaller, but it opens up other possibilities like having the stack automatically allocated for you from the bss section, and having proper memory segment permissions applied by the bootloader.

Yeah, this is planned.

dbittman commented 2 years ago

@NateHerman yep. That's a somewhat older chip, so I suspect the issue is that I'm turning on some CPU feature without properly checking cpuid to see if it's present.

circutrider21 commented 2 years ago

My guess is the fact that the kernel automatically enables MMX, along with XSAVE, both of which aren't supported on certain CPUs. I am thinking the issue is XSAVE, since it isn't supported on my old celeron (which somehow has VMX support).

dbittman commented 2 years ago

Yeah, that is was one issue, it also assumed rdfsbase and friends was present and just blindly enabled them in cr4 :/

Once CI passes I'll merge the fix.

NateHerman commented 2 years ago

@dbittman oh nice! I was just typing out a response saying, hey, I totally get if it's an in-the-weeds bug and not too worth spending time on it at this early stage of twizzler. But sounds like a possible fix is forthcoming. I'll try compiling and running it once I see a merge. Thanks sofar!

dbittman commented 2 years ago

Merged a fix, try it now.

NateHerman commented 2 years ago

@dbittman qemu's ui just hangs with a blank screen and on the command line I get:

QEMU 4.2.1 monitor - type 'help' for more information (qemu) qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.8000000AH:EDX.tsc-scale [bit 4]

info registers gives:

info registers RAX=0000000000000000 RBX=0000000000000000 RCX=ffffffff8013cc50 RDX=0000000000000019 RSI=0000000000000004 RDI=ffffffff8077c340 RBP=ffffffff80112710 RSP=ffffffff801125a0 R8 =0000000000000004 R9 =0000000000000010 R10=ffffffff80115098 R11=0000000000000000 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 RIP=ffffffff802d00d2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00c00000 CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 ffffffff 00c00000 DS =0000 0000000000000000 ffffffff 00c00000 FS =0000 0000000000000000 ffffffff 00c00000 GS =0000 0000000000000000 ffffffff 00c00000 LDT=0000 0000000000000000 ffffffff 00c00000 TR =0028 ffffffff80579941 00000067 00008b00 DPL=0 TSS64-busy GDT= ffffffff805799c0 00000037 IDT= ffffffff8077b000 00000fff CR0=80000011 CR2=ffff800000001ff0 CR3=0000000000001000 CR4=000000a0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d00 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 XMM08=00000000000000000000000000000000 XMM09=00000000000000000000000000000000 XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000 XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000 XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000

if that helps.

circutrider21 commented 2 years ago

@NateHerman from judgement, it seems to be that a page fault occurred, but once again, RIP doesn't seem to be right. If I were you, I'd run addr2line on the faulting IP (RIP=xxx in the log trace, like I showed you before). Also, hooking GDB up to QEMU wouldn't be that bad of an idea either...

dbittman commented 2 years ago

Ah, shoot. I was able to repro the bug on qemu by changing the cpu model, so I was hoping that would be close enough.

Can you post the output of cat /proc/cpuinfo?

yeah page fault seems likely, looks to be in the physical memory map. And yeah, I'll second trying to hook up GDB (this is useful to know how to do anyway)

NateHerman commented 2 years ago

Here's the output of cat /proc/cpuinfo:

n8@n8-ZX6900:~/twizzler-learning/twizzler$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz stepping : 2 microcode : 0x11 cpu MHz : 3058.695 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm arat flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit bogomips : 6117.36 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz stepping : 2 microcode : 0x11 cpu MHz : 3058.691 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm arat flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit bogomips : 6117.36 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz stepping : 2 microcode : 0x11 cpu MHz : 3058.703 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm arat flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit bogomips : 6117.36 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz stepping : 2 microcode : 0x11 cpu MHz : 3058.685 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm arat flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit bogomips : 6117.36 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

NateHerman commented 2 years ago

@circutrider21 I just noticed your addr2line suggestion:

addr2line -e /boot/vmlinuz-5.4.0-26-generic ffffffff802d00d2 addr2line: /boot/vmlinuz-5.4.0-26-generic: warning: ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss ??:0

I'm gonna see if I can figure out how to hook up GDB

dbittman commented 2 years ago

Okay, if I'm reading those flags right, your CPU doesn't support 1G pages, which the mapping code assumed, and mapped physmem with 1G pages. I've added a feature flag test for this. See if you can pull the branch in #80 and let me know if that improves things.

NateHerman commented 2 years ago

@dbittman Yes, that definitely improves things. Although, (keep in mind I'm new to this) I was expecting the OS to load up in QEMU's UI. But it looks like it just loads in the terminal:

BdsDxe: failed to load Boot0001 "UEFI QEMU DVD-ROM QM00005 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x2,0xFFFF,0x0): Not Found BdsDxe: loading Boot0002 "UEFI QEMU HARDDISK QM00001 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x0,0xFFFF,0x0) BdsDxe: starting Boot0002 "UEFI QEMU HARDDISK QM00001 " from PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x0,0xFFFF,0x0) [kernel] boot with cmd `' [kernel::mm] initializing memory management [kernel::debug] parsing kernel debug image remap 2 32 setting 2 32 masked=false remap 5 37 setting 5 37 masked=false remap 9 41 setting 9 41 masked=false remap 10 42 setting 10 42 masked=false remap 11 43 setting 11 43 masked=false [kernel::cpu] enumerating and starting secondary CPUs [kernel::initrd] loading module... [kernel::initrd] loading "init" -> 10000000000000001 [kernel::initrd] loading "devmgr" -> 10000000000000002 [kernel::initrd] loading "netmgr" -> 10000000000000003 [kernel::initrd] loading "nettest" -> 10000000000000004 [kernel::arch::x86-pit] setting up for statclock with freq 127 (7 ms) setting 4 36 masked=false [kernel::machine::pcie] init [kernel::main] processor 0 entering main idle loop [init] starting userspace Hello, World 42 waiting for network manager to come up Hello from netmgr network manager is up! Hi, welcome to the basic twizzler test console. If you wanted line-editing, you've come to the wrong place. A couple commands you can run:

in any case, just bumbling along, I see on your "Developing for Twizzler" page that when I build and run programs I run them from the ">" prompt and it returns got: "<"something r other">" . . . so I guess this is the extent of how to interact with the OS at this point? Just start writing programs and get output there? If so, sweet. I'm pumped. But if I'm supposed to be seeing something in QEMU's UI let me know.

dbittman commented 2 years ago

Great! Yeah, that's currently it. I have some major features planned and some implemented that I'll hopefully get in soon that will add a lot more OS services, but for now that's what Twizzler boots up to. Eventually I plan to get a terminal working on the framebuffer but there are only so many hours in the day :)

I've merged the fix into main.

Good luck!

NateHerman commented 2 years ago

@dbittman SOGREAT . . . I answered my own question and got the hello world running:

run hello got: Hello, Twizzler!

Thanks sosososo much! So much coolness to learn! I'm so psyched. I partially held off on learning OS's & Programming b/c the "processes" model never made sense to me. Twizzler just feels conceptually natural to me. So I'm excited to dig in! Thank you sosososo much for helping me get there.

@circutrider21 thanks for your help as well. I'll probably say hi on the discord server at some point. Thanks again!

circutrider21 commented 2 years ago

@NateHerman Glad to see that you've solved the problem 😀, as for the discord, I'll see you there :-)