hbs-rcs / hbsgrid-docs

Documentation and discussion forum for the HBS grid technology preview
https://hbs-rcs.github.io/hbsgrid-docs/
3 stars 4 forks source link

Improve Available Resources utility to show unreserved memory #15

Closed izahn closed 2 years ago

izahn commented 2 years ago

Discussed in https://github.com/hbs-rcs/hbsgrid-docs/discussions/14

Originally posted by **Econometrica17212** January 25, 2022 Hey all, I'm having trouble getting an interactive job request launched and thought I'd share the issues so that others might learn from my mistakes. I'm currently trying to launch a short_int job with 500G of RAM and 4 CPUs of RStudio. I'm working with a dataset that is 390GB, so I need the maximum amount of RAM for this. Luckily the operations I'm running shouldn't take up too much more RAM as I'm just doing some basic analysis and regressions, not data manipulation. When I run this via the tech preview environment using the submission GUI, I get the following error: ![Screenshot from 2022-01-25 10-54-59error](https://user-images.githubusercontent.com/87214219/151011612-9fea8507-fe4b-4cfc-95d0-15597071c58d.png) Looking at the user guide, there is a video documenting a similar issue where the user requests 10 CPUs and 4 GB RAM. However, the error code is different. The video shows that the usage report demonstrates that no node has the 10 available CPUs, so the job is resubmitted with 2 CPUs and it works. When I check the HBS Cluster usage monitor, it seems that there is ample room for the job to be run, e.g. on node 13. As a result, I'm a bit confused why I'm getting the error message here. It does also say when running bqueues that there are 28 pending jobs -- could this be why? Am I just behind in a long line of jobs right now? For other context, I am not running any other jobs, pending or running, on any queue, when attempting this. ![Screenshot from 2022-01-25 10-55-38usage](https://user-images.githubusercontent.com/87214219/151011864-8b2f07b8-8d9c-4dad-a582-15fc9a21c47f.png) Hopefully there is a simple explanation here that will help others understand too! (also worth noting that the button in the first image does not say "Read Documentation," so that could be changed to make it clearer which button to press)
izahn commented 2 years ago

We now have a new-and-improved available resources utility! Let me know if you see any other issues with it.