rackslab / Slurm-web

Open source web dashboard for Slurm HPC clusters
https://slurm-web.com
GNU General Public License v3.0
340 stars 97 forks source link

Gateway - Resources view "Cannot read properties of undefined" #334

Closed attssystem closed 3 months ago

attssystem commented 3 months ago

Hi :)

I'm running the gateway (3.1.0-1.deb12 in production mode using Apache) in a Debian 12 container and I've got an issue with the Resources view. I can view my racks just fine but as soon as it tries to refresh nodes' states I have a blank page and must reload it to get back to a working view. Apart from that I can point nodes and get their status just fine before it becomes blank.

This issue appears on Safari and Chrome on MacOS14 and on Chrome on Windows11.

If I use the development console (tested on Chrome on MacOS) I get those messages (last being an error of course) at Resources page load :

Start polling nodes on cluster helvetios                                           index-45f37282.js:33

Slurm-web gateway API get /agents/helvetios/nodes                     index-45f37282.js:13

TypeError: Cannot read properties of undefined (reading 'length')
    at p5.render (index-45f37282.js:33:149701)
    at index-45f37282.js:33:150114
    at Array.map (<anonymous>)
    at m5 (index-45f37282.js:33:150105)
    at w (index-45f37282.js:33:153123)
    at index-45f37282.js:33:153356
    at dl.fn (index-45f37282.js:9:9114)
    at dl.run (index-45f37282.js:9:1517)
    at get value (index-45f37282.js:9:9359)
    at index-45f37282.js:33:154569

but it still loads the page correctly, then those messages repop when the page goes blank and regularly after.

I can't see anything suspicious in gateway logs.

Let me know if I can bring more information about it.

rezib commented 3 months ago

Hello @attssystem, I suspect this one to be a duplicate of #328. Do you have node names that are not suffixed by digits?

attssystem commented 3 months ago

Unfortunately I only have nodenames suffixed with digits (a letter and 3 digits), just checked with sinfo.

By the way, I just noticed that even in the working view, the node list is empty (see screenshot below) like in #328 :

Capture d’écran 2024-08-19 à 15 10 56

The 2D rack view lacks information because I hide some of it but the list below (Nodename .. State .. Allocation ...) is really empty.

attssystem commented 3 months ago

By the way, clicking on a node works and displays all its info.

rezib commented 3 months ago

By any chance, did you check this issue was solved by switching back to API v0.0.39, as well as for #335?

attssystem commented 3 months ago

I tried at the same time but no difference. I should have added a comment :)

rezib commented 3 months ago

The error occurs in foldNodeset() function. There are some unit tests to cover many node name patterns, and it seems your node pattern with a character and 3 digits is already covered.

I can't figure out what is going here, I need more input to understand. Can you send me the output of:

$ curl --unix-socket /run/slurmrestd/slurmrestd.socket http://slurm/slurm/v0.0.40/nodes

If you are concerned about privacy, you can also send me the output by email at hidden.

attssystem commented 3 months ago

Sent ;)

rezib commented 3 months ago

After some investigation by email, it appeared that a node with a name without digit was returned by slurmrestd REST API /nodes endpoint, even though it wasn't visible with sinfo. This issue is actually a duplicated of #328 whose fix is landed in upcoming release v3.2.0.