Closed bviviano closed 3 months ago
I'm noticing this even with only a single job on a node:
Slurm Web 2.2.2
Slurm Web 2.2.5
There are 3 changes that are the potential culprits:
The Major code changes between https://github.com/edf-hpc/slurm-web/commit/c233323e514fb41e9d2490a69212378130c0712b and https://github.com/edf-hpc/slurm-web/commit/c1d1ad2f41efc365ad3bfea6fd380c15c3033d8d Is when the error was introduced. reverting only the 2d-draw.js file back to https://github.com/edf-hpc/slurm-web/blob/86ac1c6d1b3873de80320ec06bc2fed6bdb26e0c/dashboard/js/draw/2d-draw.js has it working mostly like normal, but there are still some errors with the layout.
I think I've tracked an issue down in 2.2.5 to the getCoreABSCoordinates function. I noticed that certain times, I'm having multiple cores map to the same X,Y pairs. Looking at doing a fix sometime soon, if I can figure it out.
so; 2 major issues I've found:
/util/jobs.js
buildAllocatedCPUs function, you will only end up getting one layout returned for a node, regardless of how many jobs (and therefore how many layouts) should be returned. This is because it overwrites the value for the 'layout' key every job. When this gets fixed, it'll necessitate the update of the drawCores function in /draw/2d-draw.js
as it is only expecting a single layout. [2, 3, 4, 5, 6, 7, 8]
with 14 allocated cores, and layout [0,1]
with 4 allocated cores respectively. in this case; I think it actually makes sense to revert back to just doing drawCores by the allocatedCPUs instead of the layout. Note that you could do the layout drawing, but would require working a little magic to properly show the cores used. Just checked 2.2.6 and it has the same issue w/Slurm 19.05.8. Any idea if there is a way to fix it?
I ended up re-writing the functions to match old functionality, partially to add support for a GPUs page. https://github.com/BSCrumpton/slurm-web/tree/GPUBranch some of the relevant code can be seen here
So just replace jobs.js and 2d-draw.js from your repo or do you have README someplace with additional instructions.
Thanks.
honestly, maybe just dashboard/js/draw/2d-draw.js . Note that I haven't tested this in a while, so I'm not 100% sure. No other readme- but I should add it to the docket in the future :joy:
I replaced the draw/2d-draw.js
from the tagged 2.2.6 branch with the one for your repo and its now drawing correctly. I think the 3d draw might still be off, but no one really uses that, except for a demo and then no one cares about the cores.
Any pointers as to what the GPU changes you made do and how to incorporate them.
Wanted to edit to add you do need utils/jobs.js
as well or the node count drawing gets off due to a math error.
basically- I added another tab to the main menu (top right) called GPUs that displays similar to JobsMap, but showing GPUs instead of cores.
Additionally, in the main Jobs tab, resources now show GPUs. Note that this functionality is entirely dependent on your slurm/pyslurm version. I'm using the TRES fields to get # free and allocated GPUS, and older slurm versions don't support that field at all.
Thanks for the screenshot, that makes it clear what you're doing. I only have 4 GPGPUs on my cluster, across 200 nodes / 5 racks, so it wouldn't really matter too much to my users right now, but its a nice extension.
there are two problems in dashboard/js/draw/2d-draw.js and dashboard/js/utils/jobs.js. And I fixed this problems such like this. Later I will push my local code to solve this problem.
This issue concerns Slurm-web v2 which is not maintained anymore. You are highly encouraged to test the new version v3.0.0. The quick start guide for v3.0.0 is available online: https://docs.rackslab.io/slurm-web/install/quickstart.html
Unless someone is motivated to maintain the old version of Slurm-web or you have a justified reason to keep this issue open, it will be closed in a few weeks.
For the reasons explained in the previous comment, I finally close this issue.
Something changed in 2d-draw.js between 2.2.2 and 2.2.5 (I am trying to issolate it now) related to multiple job running on a single node.
In 2.2.2, when multiple jobs where running on a single node, each square representing a used core got a different color. In 2.2.5, only the last core/job gets a color, the other used cores show as if they are not allocated.
I am attaching pictures from my interface Slurm Web 2.2.2
Slurm Web 2.2.5
These are from the same running HTTPd instance, same node captured via screen shot, just different install directories for 2.2.2 vs. 2.2.5.
I've isolated the issue to the drawCores function in 2d-draw.js. I am working through the code to try and understand why it no longer is drawing a expected, but thought I'd open this ticket in case there is something else I am missing.