Open langyuxf opened 10 months ago
Word from the source is that this is unsupported by design. (Forking a process using a vGPU that is)
Can you elaborate on this? What is unsupported?
Hi @pmartincic ,
I'm not forking a process(child process) using a vGPU, I mean the parent process should still be able to use the vGPU. But now the parent process can't use vGPU when using get_userpages*.
Regards, Lang
Ahhh, I'm sorry I misread what you wrote. That's my fault. I'll start the conversation internally again.
Please see https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#page-maybe-dma-pinned-the-whole-point-of-pinning https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#another-way-of-thinking-about-foll-get-foll-pin-and-foll-longterm https://github.com/torvalds/linux/blob/master/include/linux/mm.h#L1967
Hi @pmartincic,
Any updates?
Thanks, Lang
Do you have a sample/repro/minimal app/code sample you can post? I don't work with/on these features normally, is a little out of my wheelhouse and I want to perform a sanity check before I pass it along.
(Edit: they agree the parent process should still work)
It's hard to put all sample codes here to reproduce the issue.
What do you want to check?
Hi @pmartincic,
Have you guys seen these docs?
https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#page-maybe-dma-pinned-the-whole-point-of-pinning https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#another-way-of-thinking-about-foll-get-foll-pin-and-foll-longterm https://github.com/torvalds/linux/blob/master/include/linux/mm.h#L1967
Especially, the following
/*
* This should most likely only be called during fork() to see whether we
* should break the cow immediately for an anon page on the src mm.
*
* The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
*/
static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
struct folio *folio)
{
VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));
if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
return false;
return folio_maybe_dma_pinned(folio);
}
Regards, Lang
Trust but verify.
I don't work on this area of the product. My plate is pretty full. I don't have time to write a repro. Somebody who does know this area is willing to look at it. I have a limited amount of political capital. This person helping owes me nothing and it's not their responsibility to do what I ask. I will pass this bug along to them if you give a repro that I can sanity check.
I'm working on some closed source work on AMD hardware and It's hard for me to expose the codes.
The bug is actually clear.
You're missing the point. I'm probably not going to dig into this. The person I'm talking to who has the capacity to, would like a sample repro.
For their sake please make one.
Thanks,I got you point.
But it's inconvenient for me to expose a sample to reproduce which includes some closed source parts.
Can I talk with that person?If they have some questions.
I create a repro demo and hide hardware driver specific details. You guys must use AMD gfx9 GPU and above.
https://github.com/langyuxf/dxgkrnl_linux.git
Thanks Lang
Hi @pmartincic @iourit,
We observed similar issues under memory pressure. Pages acquired via get_user_pages_fast() are reclaimed by linux memory management system while GPU page tables are not updated accordingly. Then the GPU still operates on the original pages.
Have you guys reproduced the issue? Thanks.
Regards, Lang
This should be fixed with https://github.com/microsoft/WSL/releases/tag/2.2.3.
Windows Version
10.0.22631.3085
WSL Version
2.0.9.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.133.1-1
Distro Version
Ubuntu 22.04
Other Software
No response
Repro Steps
1, use mmap to allocate a PRIVATE + ANONYMOUS CPU memory 2, use dxgkio_create_allocation and the CPU memory to create a GPU allocation 3, fork a child process, then write to this memory from CPU side in parent process
Expected Behavior
The backing store physical pages of CPU VA in parent process should be still mapped in GPU page table.
Actual Behavior
Due to fork and write from parent process, copy-on-write is triggered, new physical pages are allocated for parent process which are not mapped in GPU page table.
Diagnostic Logs
No response