phil-opp / blog_os

Writing an OS in Rust
http://os.phil-opp.com
Apache License 2.0
14.26k stars 1.01k forks source link

Comments for "https://os.phil-opp.com/paging-implementation/" #570

Closed utterances-bot closed 3 years ago

utterances-bot commented 5 years ago

This is a general purpose comment thread for the Paging Implementation post.

lilyball commented 5 years ago

The bootloader and x86_64 changelog links are swapped.

phil-opp commented 5 years ago

@lilyball Thanks! Fixed in cc0f684.

DanGdl commented 5 years ago

This paging approach seems much clear and easy to understand

phil-opp commented 5 years ago

@DanGdl Great to hear that!

pitust commented 5 years ago

That's better than last time (I hope). Last time the code didn't compile.

phil-opp commented 5 years ago

@pitust Please open an issue when you have any problems with compiling the code. You can also find the complete code for the post in the post-10 branch. This branch is automatically tested on every new nightly using different continuous integration services, so it should always compile without problems.

songzhi commented 5 years ago

If I use the 'map_physical_memory' feature,and I want to create a new level_4_page_table to be switched to when I create a new process.Do I need to map the whole physical memory in the new page table manually just like what you did in 'bootloader' crate?Actually I found that snippet of code,but I am not really sure should I type it again in my code.

phil-opp commented 5 years ago

@songzhi If you want that your new process also has access to the physical memory then you need to add a mapping for it in its address space. As an optimization you can link the level 3 or level 2 page table from your current address space to the new address space instead of copying them, so that both address spaces share the same level 3/level 2 table.

krsoninikhil commented 5 years ago

For anyone who is updating x86_64 crate from 0.4.0 to required 0.5.2 and getting error like try_from not being stable, you also need to update your rustc version (rustup update). After this, you'll also be required to update your code as per new version of x86_64 which will be suggested by compiler itself or you can refer this commit.

redamith2 commented 5 years ago

How to create global static frame_allocator so that I can move the page mapping inside the page fault handler. I have tried

In lib.rs

pub static mut FRAME_ALLOCATOR: Option<BootInfoFrameAllocator<Iterator> = None

I keep getting an error "doesn't have a size known at compile-time"

phil-opp commented 5 years ago

How to create global static frame_allocator so that I can move the page mapping inside the page fault handler. I have tried […]

Discussed in https://github.com/phil-opp/blog_os/issues/593

izzabelle commented 5 years ago

Just wanted to mention that the src/memory.rs lacks the inclusion of a use bootloader::bootinfo::{MemoryMap, MemoryRegionType}; when you get to talking about the writing the boot frame allocator. I know it's a tiny thing but it threw me for a loop the first time as my first guess would have had it somewhere in the paging structures in the x86_64 crate and not the bootloader.

phil-opp commented 5 years ago

@nrlsk Thanks for reporting! Fixed in 6db5ad78ab2f8a10e23e5e22639eddb03fb39232.

phil-opp commented 5 years ago

@tom7980 Seems like you deleted your comment. It was a valid question, so let me give you a short answer anyway.

Yes, the frame allocator could be implemented this way. We used a similar implementation before. The problem is that it couldn't be stored in a static then because we can't name the generic I parameter, because it depends on unnamable closure types. See https://github.com/phil-opp/blog_os/issues/593 for a discussion of the problem.

I'm struggling to test it myself as I'm having trouble building all the dependencies with xbuild.

Could you open an issue for this with more specific information?

tom7980 commented 5 years ago

@phil-opp I can open an issue about it tomorrow as I'm on the wrong computer currently, I was only struggling on my windows environment, once I booted into Linux and tried the build steps I managed to sort it so it might just be my windows env is mis-configured.

Regarding my question, I actually had a go at implementing it but I came up against the issue that the initialisation of the wrapper was expecting a type I where I: Iterator, but due to passing a memory map to the function and doing some transformations onto it I was getting a nested iterator and it wouldn't let me store it anyway!

I'm going to take another look at it tomorrow but I think I understand what you meant about the static still trying to wrap my head around Traits and Generic Types, and to be fair most of Rust!

phil-opp commented 5 years ago

@tom7980 Thanks for elaborating!

once I booted into Linux and tried the build steps I managed to sort it so it might just be my windows env is mis-configured.

I just want to ensure that xbuild/bootimage works on all OSs. If there is some configuration problem with Windows we could at least document it to help others.

I'm going to take another look at it tomorrow but I think I understand what you meant about the static still trying to wrap my head around Traits and Generic Types, and to be fair most of Rust!

Let me/us know if you have any questions, we're always happy to help!

tiehexue commented 5 years ago

" We can't start the region at 28 KiB because it would collide with the already mapped page at 1004 MiB."

Is that "1004 Mib" should be "1000 Kib" ?

phil-opp commented 5 years ago

@tiehexue Yes, thanks for reporting! Fixed in a8908acecc0de60d69c569345db6f3eb24d63431.

lenenel commented 5 years ago

In Temporary Mapping:

Now the kernel can access the level 2 page table by writing to page 0 KiB and the level 4 page table by writing to page 33 KiB.

Should it be 36 KiB for level 4 page? Or I missing something?

phil-opp commented 5 years ago

@lenenel You're right, thanks for reporting. Fixed in 2a0e5e469674071d85169ca73a261cb51c8be1e0.

Barronli commented 4 years ago

@phil-opp Thanks for your great efforts in making the wonderful series! I have a few probably silly questions:

  1. The complete-mapping mechanism uses fixed_offset + physical_addr to get the virtual_addr. Then one should be able to use virtual_addr - fixed_offset to get the physical_addr, as long as the page is mapped, which could be indicated with a PRESENT bit in a pre-mapped page (1bit for 4KB) without the need to walk through the page table hierarchy as given in translate_addr_inner. Am I missing something?

  2. Related to the question above, the VGA text buffer in the post uses identity mapping, is it excluded from the "complete-mapping" mechanism? Or does it mean 0xb8000 can be accessed with virtual addresses of both 0xb8000 and 0xb8000 + fixed_offset?

  3. How do you debug the code for the OS development?

Appreciate your response. Thanks for your time!

bjorn3 commented 4 years ago
  1. How do you debug the code for the OS development?

https://os.phil-opp.com/set-up-gdb/

phil-opp commented 4 years ago

@Barronli You're welcome, I'm glad you like it!

The complete-mapping mechanism is only used for accessing physical memory. There can be other virtual mappings in the page table that point to the same physical frames. For example, the VGA text buffer is mapped at both 0xb8000 and 0xb8000 + fixed_offset. This means two things: First, we can't translate an arbitrary virtual address by just subtracting the fixed_offset since this only works if the virtual address is part of the complete-mapping. Second, it is important that we do not access arbitrary memory in the complete-mapping because it might lead to undefined behavior if the same memory is mapped at a different virtual address to (it effectively creates two &mut references that point to the same memory).

Regarding debugging, there are a few useful techniques. For example, you can use QEMU's -d int flag to retrieve the register content on every exception/interrupt. Through the content of the instruction pointer register, you can then find the instruction that caused the exception in the disassembly of your kernel binary. You can also use the QEMU monitor or GDB for full debugger support, as @bjorn3 mentioned. I'm planning a "Debugging" post that covers this topic, but it will probably take some time until I get to it.

I hope this helps!

siebenHeaven commented 4 years ago

@phil-opp Before I get to my comment - great set of posts - incredibly simple to understand and follow and a good intro to rust. Can see a lot of efforts that you have put into this - can't appreciate enough.

Now onto some failures that I see with updated dependencies:

Building bootloader Downloaded x86_64 v0.7.2 Compiling semver-parser v0.7.0 Compiling cc v1.0.37 Compiling cast v0.2.2 Compiling bootloader v0.6.4 (/home/anup/.cargo/registry/src/github.com-1ecc6299db9ec823/bootloader-0.6.4) error[E0461]: couldn't find crate core with expected target triple x86_64-bootloader-1691626545719462551 | = note: the following crate versions were found: crate core, target triple x86_64-bootloader-8482436261164568872: /home/anup/OSDev/rust_os/target/bootimage/bootloader/bootloader-sysroot/lib/rustlib/x86_64-bootloader/lib/libcore-c2b176c4b831be55.rlib crate core, target triple x86_64-bootloader-8482436261164568872: /home/anup/OSDev/rust_os/target/bootimage/bootloader/bootloader-sysroot/lib/rustlib/x86_64-bootloader/lib/libcore-c2b176c4b831be55.rmeta

error: aborting due to previous error

error: Could not compile cast. warning: build failed, waiting for other jobs to finish... error: build failed Error: Bootloader build failed:

Did I miss something / do something wrong?

phil-opp commented 4 years ago

@siebenHeaven Thanks for the praise, it's great to hear that you like it!

I already encountered this error a few times. I think the problem is that cargo-xbuild fails to remove old versions of the sysroot after the target JSON changed, but I haven't had the time to debug it yet. As a workaround, try to do a cargo clean. This should clean all compilation artifacts and so that a subsequent build recompiles core for the correct target.

siebenHeaven commented 4 years ago

@phil-opp Thanks for the instant help, that worked!

htbf commented 4 years ago

I got around the type awkwardness / iterator inefficiency for the allocator as follows:

pub struct BootInfoFrameAllocator<T: Iterator<Item = PhysFrame>>(T);

impl BootInfoFrameAllocator<core::iter::Empty<PhysFrame>> {
    pub unsafe fn new(memory_map: &'static MemoryMap)
        -> impl FrameAllocator<Size4KiB>
    {
        BootInfoFrameAllocator(
            memory_map.iter()
            .filter(|r| r.region_type == MemoryRegionType::Usable)
            .map(|r| r.range.start_addr()..r.range.end_addr())
            .flat_map(|r| r.step_by(4096))
            .map(|addr| PhysFrame::containing_address(PhysAddr::new(addr)))
        )
    }
}

unsafe impl<T> FrameAllocator<Size4KiB> for BootInfoFrameAllocator<T>
    where T: Iterator<Item = PhysFrame>
{
    fn allocate_frame(&mut self) -> Option<PhysFrame> {
        self.0.next()
    }
}

It's not perfect, and implementing new for Empty is a slightly weird artifact, but since only one implementation for new exists, you don't need to type annotate it and you can just call

let mut frame_allocator = unsafe {
    memory::BootInfoFrameAllocator::new(&boot_info.memory_map)
};
phil-opp commented 4 years ago

@htbf We used this approach in the first version of the post. The problem is that the generic type can't be named since it contains closures. For this reason, we changed it to the non-generic version to allow storing the BootInfoFrameAllocator in static variables more easily.

I plan to switch back to this version as soon as the Permit impl Trait in type aliases RFC is fully implemented. Then we could use an approach like this to store a generic BootInfoFrameAllocator in a static variable.

elliott-wen commented 4 years ago

I am pretty rusty in Rust. (Pun).

Just wondering why in the following statement

// read the page table entry and update frame let entry = &table[index];

we need & before table[index];?

Cheers

bjorn3 commented 4 years ago

If you dont use that & you are telling the compiler that you want the specific value at that offset, instead of a reference to it. PageTableEntry however doesnt implement Copy, so it cannot copy it,m and moving it is not allowed for indexing, as the compiler doesnt know at compile time which entry got moved. Because it can't copy and can't move the PageTableEntry, it gives a compile error.

elliott-wen commented 4 years ago

@bjorn3 Thank you, that's pretty helpful. Kindly let me ask another silly question.

for &index in &table_indexes

I understand the second & means getting the iter() of the table_indexes,

but how about the first & in &index? I got quite confused

bjorn3 commented 4 years ago

Behind the scenes for is syntax sugar for a while loop + matching on the result of iter.next(). This means that you can use patterns before the in, just like with matches.

The & of &index means that the item is a reference, and you want to get the value behind the reference:

match Some(1u32) { // match on Option<u32>
    Some(x) => ..., // x is of type u32
    None => ...
}

match &1u32 { // match on &u32
    &x => ... // x is of type u32
}
elliott-wen commented 4 years ago

Thanks it is really helpful.

Just one more question, getting the value behind the reference has nothing to do with ownership right? Sorry, as an old C++ programmer, I feel so confused about all the generation alpha concepts. :D

Cheers Elliott

On Wed, 23 Oct 2019 at 22:53, bjorn3 notifications@github.com wrote:

Behind the scenes for is syntax sugar for a while loop + matching on the result of iter.next(). This means that you can use patterns before the in, just like with matches.

The & of &index means that the item is a reference, and you want to get the value behind the reference:

match Some(1u32) { // match on Option Some(x) => .... // x is of type u32 None => ... } match &1u32 { // match on &u32 &x => ... // x is of type u32 }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/phil-opp/blog_os/issues/570?email_source=notifications&email_token=ABLPC3A4QP7AHIF5JKTLBG3QQANJFA5CNFSM4G6WZXEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECAZYNA#issuecomment-545365044, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLPC3D23R5PRYYJNPIN2Z3QQANJFANCNFSM4G6WZXEA .

phil-opp commented 4 years ago

@elliott-wen In general it's not allowed to take the value behind a reference because it takes ownership, which is not allowed for references (aka "borrows"). What you can do, however, is to duplicate the value, while leaving the original value in place and thereby preserving ownership.

Depending on the value, there are different ways to duplicate it. Some values implement the Clone trait, which allows explicitly duplicating the value through a clone method. Other values, such as integer types, additionally implement the Copy trait, indicating that the value can be copied simply by copying its memory representation (i.e. there is no additional code required). For values that implement Copy, Rust performs the duplication automatically so that no explicit clone call is necessary.

In the table index example, the table index types implement the Copy trait. This is the reason that simply taking the value behind the returned reference is allowed without violating the ownership rules: We implicitly duplicate the value instead of taking ownership of the referenced value.

I hope this helps!

andyhhp commented 4 years ago

For recursive pagetables, there is a massive disadvantage which I've never seen described in hobby kernel discussions.

When accessing pagetable entries in the recursive part of linear address space, you create structure cache and TLB entries for each memory location you access (even speculatively, so all bets off in practice).

These are not flushed by the INVLPG instruction because, while INVLPG flushes the paging structure cache (a relic of 486 processor), it doesn't flush fully translated TLB entries, other than the target of the instruction.

As a result, if you modify pagetable entries in the recursive section, use INLVPG, then want to come and read/write a pagetable entry later, you may end up operating on the stale mapping created the first time around, not the current one.

This is a subtle TLB flushing bug which, depending on the size of the TLB, and how much the processor speculates in the meantime, may cause reads/writes into the recursive part of the pagetable hit the wrong memory.

Recursive pagetables are a neat idea, that are far too clever for their own good. IMO, the only people qualified to use this technique are the ones who know their life will be better if they don't.

foxcob commented 4 years ago

Thank you so much for the work you've done on this blog and continuing to share it. I've been following along and creating my own kernel with some differences from what you've done, but so far it has been very successful thanks to the work you've done and the crates you're providing. I do have one question regarding the mapper and the restriction (even if not strictly enforced) to using UnusedPhyFrame. I find that this is incompatible with a number of kernel concepts. For instance:

  1. Mapping physical memory to a logical address range for performing things like offset page table acesss. This necessarily implies that most frames will be mapped twice. Once to there assigned virtual address and once to the mapping of physical memory.
  2. User space shared memory. While the kernel may not use this technique for safety reasons, it is a common user space facility to allow user processes to create shared memory regions for IPC or other purposes.
  3. Shared libraries. Probably the safest of the examples, multiple read-only mappings of frames to map libraries into a processes virtual memory space.
  4. Copy-on-write: multiple mappings to a frame that will be copied to a new frame if written to.

There are probably more, but that's enough to make my point. Is this something you've considered or is there maybe a solution that I'm not considering? I know that I can easily bypass the restriction, but then it directly ignores an obvious condition that the method communicates through the passed parameter types. Would love to hear yours or others thoughts on this.

phil-opp commented 4 years ago

@foxcob Sorry for replying so late!

Thank you so much for the work you've done on this blog and continuing to share it. I've been following along and creating my own kernel with some differences from what you've done, but so far it has been very successful thanks to the work you've done and the crates you're providing.

Great to hear that!

I do have one question regarding the mapper and the restriction (even if not strictly enforced) to using UnusedPhyFrame. I find that this is incompatible with a number of kernel concepts.

We added the UnusedPhysFrame type in https://github.com/rust-osdev/x86_64/pull/89 in order to make the map_to method safe. I agree that your examples are completely valid and that the requirement that the frame must be unused is too strict. I opened https://github.com/rust-osdev/x86_64/issues/123 for this and outlined two possible solutions there.

phil-opp commented 4 years ago

@andyhhp Thanks a lot for your comment and also sorry for the late reply!

As a result, if you modify pagetable entries in the recursive section, use INLVPG, then want to come and read/write a pagetable entry later, you may end up operating on the stale mapping created the first time around, not the current one.

Just to ensure that I understand it correctly: Such a stale mapping should only happen if change the mapping of a page table itself, e.g. if you change a level 3 entry to point to a different level 2 table, right? Changes to mappings in a level 1 table are not problematic because the page address is flushed using invlpg. Or am I missing something?

I try to expand the post to mention these potential problems.

Recursive pagetables are a neat idea, that are far too clever for their own good.

I agree. I like them because they utilize the multilevel page table structure in a interesting way, but the approach is too complicated in practice (for a variety of reasons).

phil-opp commented 4 years ago

@andyhhp I opened https://github.com/phil-opp/blog_os/pull/740 to mention this drawback on the blog. Let me know what you think!

airt commented 4 years ago

In Map the Complete Physical Memory:

This way, mapping 32 GiB of physical memory only requires 132 KiB for page tables since only one level 3 table and 32 level 2 tables are needed.

Should it be "one level 4 table and 32 level 3 tables" ?

phil-opp commented 4 years ago

Each level 2 table can manage 512 * 2MiB = 1GiB of memory. So for 32 GiB physical memory, we need 32 level 2 tables, one level 3 table, and one level 4 table. I didn't mention the level 4 table because you always need it and there can be only a single instance of it anyway.

I think the main point of this sentence was that you don't need to create level 1 tables when using huge pages. This results in large memory savings because you would need 32*512 = 16384 level 1 tables for mapping 32GiB otherwise, which would require 64MiB of memory for storing just the level 1 tables themselves.

thorlucas commented 4 years ago

This is a great post! But I'm still rather confused. All of these solutions seem like hacks/workarounds. I have a hard time believing that most "real-world" kernels use any of these solutions, or that they would choose to waste any memory on this.

Why and how are these "real world" kernels able to access physical addresses when ours cannot?

phil-opp commented 4 years ago

According to this stack overflow answer, Linux identity-maps all physical memory (at least on 32-bit). So it basically uses the Map the Complete Physical Memory approach with an offset of 0.

Being able to control the access to physical memory is very important for designing a safe OS kernel, so I think it's a good thing that there are no ways to bypass the page tables. This way, we can exactly control who is allowed to access a specific page table and who is not. Yes, there is some wasted memory due to this, but it's only a small percentage of the memory amout and memory protection is definitely worth it.

thorlucas commented 4 years ago

Huh, that's very interesting. I did not expect that. Thanks!

kennystrawnmusic commented 4 years ago

The code for the usable_frames function produces an error now that UnusedPhysFrame has been deprecated in favor of use of PhysFrame exclusively: ‘error[E0277]: ‘()’ is not an iterator`

Really bizarre message given that the object you’re attempting impl Iterator is a PhysFrame and not a ().

thorlucas commented 4 years ago

@realKennyStrawn93 I followed this tutorial and did not get any errors. What version of x86_64 are you using?

kennystrawnmusic commented 4 years ago

Update: turns out I forgot to remove a semicolon after stripping the last line from the function. Now it works fine.

drzewiec commented 3 years ago

I think this is more of a Rust question than an OS dev question, but I'm confused about the use of unsafe around the page table operations. For example, instead of the way the blog post has, you could define active_level_4_table this way:

pub fn active_level_4_table(physical_memory_offset: VirtAddr)
    -> &'static mut PageTable
{
    // snip
    unsafe { &mut *page_table_ptr } // unsafe
}

Which then lets you call the function from outside an unsafe block, and is more explicit about what part of the function is unsafe. I assume that there's a reason to make the entire function unsafe, but I'm not seeing it. Can someone clarify this point?

thorlucas commented 3 years ago

@drzewiec As far as I can understand it's about signalling that the operation being performed is unsafe. A call to this function is always unsafe. Marking the whole function tells the developer that when they call this function, they have to be careful about the parameters they pass in to make sure that the call is safe.

If the developer calling the function knows that they're passing in safe parameters, then the function that is in does not need to be marked as unsafe because they've guaranteed safety.

If instead the active_level_4_table function made some checks that could guarantee that this function would never return an unsafe value, then the function could be safe but just the operation marked unsafe.

drzewiec commented 3 years ago

I think I understand. Essentially the reasoning here is that active_level_4_table, as a public function, is part of our API and so if we aren't doing checks to ensure safety we should signify that to the caller through marking the function unsafe?