openzfsonwindows / ZFSin

OpenZFS on Windows port
https://openzfsonwindows.org
1.2k stars 69 forks source link

Fix arc kstat updates #310

Closed raj-kumar-soni closed 3 years ago

raj-kumar-soni commented 3 years ago

This patch partially reverts commit 88d37b7a1ae78ae36bfac16e7602885f54b266cc The issue was that the os_mem_alloc was keeping on increasing when IO workload was running, not honouring zfs_arc_max and zfs_arc_meta_limit.

chid commented 3 years ago

Looks like this will fix #299

lundman commented 3 years ago

Ah yes - the logic for setting zfs_arc_max is no longer run, but the reason it was taken out, is that ksp when called is for osx_kstat_update() is a osx_kstat_t osx_kstat, then we pass that to arc_update() as if it is kstat_t *arc_ksp. Quite naughty. A quick fix would be to change to:

    arc_kstat_update(arc_ksp, rw);
    arc_kstat_update_osx(ksp, rw);
raj-kumar-soni commented 3 years ago

Ah yes - the logic for setting zfs_arc_max is no longer run, but the reason it was taken out, is that ksp when called is for osx_kstat_update() is a osx_kstat_t osx_kstat, then we pass that to arc_update() as if it is kstat_t *arc_ksp. Quite naughty. A quick fix would be to change to:

  arc_kstat_update(arc_ksp, rw);
  arc_kstat_update_osx(ksp, rw);
raj-kumar-soni commented 3 years ago

With above changes (using arc_ksp instead of ksp in arc_kstat_update() ) not able to install ZFSin as arc_ksp initialize after this. It crashes with stack trace:-

ZFSin!arc_kstat_update+0xd [C:\ZFS-vt-code\ZFSin\ZFSin\zfs\module\zfs\arc.c @ 7329] ZFSin!osx_kstat_update+0x5a7 [C:\ZFS-vt-code\ZFSin\ZFSin\zfs\module\zfs\zfs_kstat_windows.c @ 430] ZFSin!kstat_osx_init+0xa0 [C:\ZFS-vt-code\ZFSin\ZFSin\zfs\module\zfs\zfs_kstat_windows.c @ 625] ZFSin!DriverEntry+0x2e [C:\ZFS-vt-code\ZFSin\ZFSin\driver.c @ 75] ZFSin!FxDriverEntryWorker+0xa9 [d:\5359\minkernel\wdf\framework\kmdf\src\dynamic\stub\stub.cpp @ 287] nt!IopLoadDriver+0x4bd nt!IopLoadUnloadDriver+0x4a nt!ExpWorkerThread+0x16a nt!PspSystemThreadStartup+0x55 nt!KiStartSystemThread+0x1c

lundman commented 3 years ago

Yeah, we would need to find a solution. Although, the zfs_arc_max and friends are updated in arc_kstat_update_osx(), perhaps the answer is to just bring that one back.

imtiazdc commented 3 years ago

@lundman Is something wrong with this PR? Or the build? Please let me know how we can fix - "All checks have failed".

If nothing is wrong with the PR, can this change be approved?

lundman commented 3 years ago

Ah yes, it is wrong and I was expecting an update. But it might be easier if I update it? We do need to set the values, this is true, but the old code passes the wrong struct to arc update - its pure luck that struct is big enough to fit all the writes.

imtiazdc commented 3 years ago

But it might be easier if I update it?

@lundman It would be wonderful if you could provide the update. We will be happy to test it and update you with our findings. We don't have enough awareness about that part of code to fix it ourselves in the near term.

lundman commented 3 years ago

It comes down to: static int osx_kstat_update(kstat_t *ksp, int rw) which expects that ksp to be:

typedef struct osx_kstat {
    kstat_named_t spa_version;
    kstat_named_t zpl_version;

and that update function does indeed update those members. I believe this is correct (hurrah), but then we call

        arc_kstat_update(ksp, rw);
        arc_kstat_update_osx(ksp, rw);

using the same ksp.

In this case: arc_kstat_update(kstat_t *ksp, int rw) expects the struct

typedef struct arc_stats {
    kstat_named_t arcstat_hits;
    kstat_named_t arcstat_misses;
    kstat_named_t arcstat_demand_data_hits;

So when arc_kstat_update() updates arcstat_hits it is actually overwriting spa_version! Yikes.

Later on, kstat calls arc_kstat_update() with the correct ksp - so values are put back to real ones. You'd have to be pretty lucky with timing to "see" it running kstat, since the calls are right after each other. But really, we are writing ints, and strings, over struct members that are neither, then writing the correct ones back.

imtiazdc commented 3 years ago

It comes down to: static int osx_kstat_update(kstat_t *ksp, int rw) which expects that ksp to be:

typedef struct osx_kstat {
  kstat_named_t spa_version;
  kstat_named_t zpl_version;

and that update function does indeed update those members. I believe this is correct (hurrah), but then we call

      arc_kstat_update(ksp, rw);
      arc_kstat_update_osx(ksp, rw);

using the same ksp.

In this case: arc_kstat_update(kstat_t *ksp, int rw) expects the struct

typedef struct arc_stats {
  kstat_named_t arcstat_hits;
  kstat_named_t arcstat_misses;
  kstat_named_t arcstat_demand_data_hits;

So when arc_kstat_update() updates arcstat_hits it is actually overwriting spa_version! Yikes.

Later on, kstat calls arc_kstat_update() with the correct ksp - so values are put back to real ones. You'd have to be pretty lucky with timing to "see" it running kstat, since the calls are right after each other. But really, we are writing ints, and strings, over struct members that are neither, then writing the correct ones back.

Hmm, interesting. Will you be able to submit the fix sometime soon? Say in a week? If not, I will have someone take a look at this thread and raise a PR. Please do let me know.

lundman commented 3 years ago

Yeah, I'll push something tomorrow

imtiazdc commented 3 years ago

Yeah, I'll push something tomorrow

@lundman Are there more changes in the pipeline? With this commit alone, we still don't see the arc_max limit being honored.

Total RAM: 128 GB ZFSin memory usage : 63.4 GB zfs_arc_max : 15 GB

lundman commented 3 years ago

Through other commits I think we have corrected this. Please re-open if issues remain.

imtiazdc commented 3 years ago

Through other commits I think we have corrected this. Please re-open if issues remain.

Thanks, @lundman. Yes, your latest checkins do honor the arc_max limit. The arc_max limit does get violated for a brief period but then the arc usage falls below the arc_max within a few seconds.