Closed datacore-rm closed 9 months ago
Is this for the old port called ZFSin ? Seems like that would belong in https://github.com/openzfsonwindows/ZFSin , but I don't think that is actually still being developed, last commit was more than a year ago.
it's an error from their own repository and they still use the old name https://github.com/DataCoreSoftware/openzfs
One option, the DriverEntry() routine can be failed with STATUS_INSUFFICIENT_RESOURCES if it cannot proceed without allocating the required resources. But then we have to also do the roll back part (of the already allocated resources)?
Hmm that is peculiar that spl_free_set_pressure()
is called so very early, but if it already has waiters it looks like it can. We can just make spl_free_set_pressure()
skip the
mutex_enter(&spl_free_thread_lock);
cv_signal(&spl_free_thread_cv);
mutex_exit(&spl_free_thread_lock);
part until it is up and running. Could start the free thread earlier, but I suspect that will be more complicated.
If I understand correctly, we are thinking to do continuous retry till allocation succeeds?
Then we also have to skip cv_wait(&vmp->vm_cv, &vmp->vm_lock);
where the thread will otherwise infinitely wait.
e.g
if(spl free thread running)
{
spl_free_set_pressure(0);
spl_free_set_and_wait_pressure(...)
cv_wait(&vmp->vm_cv, &vmp->vm_lock);
}
Oh are we looking at two problems perhaps? The panic above is relatively easy to fix, since it is setting that it wants pressure, then tries to wake the "free thread". The "free thread" is not running, so it panics. Skipping the wake up should be fine, as there is no thread, and when it is started, it will see the pressure.
During init there shouldn't really be any reason for alloc to fail (on 32G system) so that is curious. There was a bug with large (12G+) systems would crash, due to settings in abd_os.c
- in particular, had to be KM_NOTOUCH
.
Thanks. I shall test this change.
This is a VM in Hyper-v. The configured RAM is dynamic (not reserved) and currently assigned is only ~2.5GB.
Current: 5242880 Kb Free Space: 5022708 Kb
Minimum: 5242880 Kb Maximum: 13025020 Kb
Physical Memory: 645874 ( 2583496 Kb)
Available Pages: 63893 ( 255572 Kb)
ResAvail Pages: 450296 ( 1801184 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 4295016119 (17180064476 Kb)```
Thanks. One related query. In the above call stack, vmem_xalloc(spl_default_arena, size=1GB, align_arg=1GB, phase=0, nocross=0, minaddr=NULL, maxaddr=NULL, vmflag=VM_SLEEP) is called with allocation size=1GB. But as (align > aquantum), it actually tries to allocate (2*size-quantum=2GB-4k) !
Oh ah thanks for reminding me - rottegift sat down and cleaned up the init, to figure out why we needed NOTOUCH suddenly. I will add it.
https://github.com/openzfsonosx/openzfs-fork/commit/98ea04b2a4d22187e89f0e8e6d6ceaa7e3447151
9111a17
This is OK to close?
yes, Thanks so much.
Panic during driver install/initialize on a VM (with 32GB RAM). The new memory allocation failed in _vmemxalloc() for _spl_defaultarena. Then while trying to set the memory pressure, the condition variable _spl_free_threadcv was attempted to be used (which was not yet initialized) and this caused panic.
_vmem_init() => vmem_xalloc(spl_default_arena, size=1GB, align_arg=1GB, phase=0, nocross=0, minaddr=NULL, maxaddr=NULL, vmflag=VM_SLEEP) =>spl_free_set_pressure(0); //Similarly spl_free_set_and_wait_pressure() called next line also uses uninitialized mutex // spl_free_thread_lock =>cv_broadcast(&spl_free_thread_cv); =>panic("%s: not cvinitialised", func);
The _spl_free_threadcv is initialized later in the code flow. _spl_start()=>spl_kmem_thread_init()=>cv_init(&spl_free_threadcv)