ncsa / ncsa_system_documentation

0 stars 1 forks source link

create page on basic code benchmarking, job optimization, and job scheduling (why job not run) #32

Open craigsteffen opened 6 months ago

craigsteffen commented 6 months ago

Create a landing page to talk new users through the general ideas of:

pmenstrom commented 6 months ago

This would be very helpful.

Definitely look at the documentation for other centers and see what we can glean from them. "Imitation is the sincerest form of flattery"

craigsteffen commented 6 months ago

Ok, I just realized that we're probably going to want to build a fairly solid page on the documentation hub that's generically about running jobs on slurm. I would guess a solid percentage of the content of that page will be from the Delta "running jobs" section, which is fairly extensive.

Basically we'll bring out all the slurm commands and stuff that isn't machine-specific, and just leave the sample job scripts, which are specific to the machine.

Since I have this task to start creating general job-related pages, I think it might make sense for the first step to be to create the overall running-jobs page on the hub, and then I'll populate the new pages, then we can start bringing the other pages into it as it's convenient.

I'll do it in a separate branch, because it's going to take time.

Any objections, @lhelms2 ?

lhelms2 commented 6 months ago

@craigsteffen I already have a Slurm page, that is basically done but was just waiting for other change reviews to be completed first before I sent it in for reviews/approval. I would propose that that Slurm page be implemented, and then you can update it in a separate branch after that. If you'd like to discuss in a call, let me know.

craigsteffen commented 6 months ago

Status note: We had a talk about this. @lhelms2 is going to implement the new directory structure for a broken-up slurm/running jobs page, but keep it hidden, and then merge that into the main. That will take a week or so.

Then I can grab that, open a new branch in which I populate more pages, and then we'll work on getting that put into the public page, as we transfer stuff to it from the Delta page (and others).

@lhelms2 You can just ping me here when you're done with the invisible structure.

craigsteffen commented 5 months ago

@lhelms2 Peter and I talked today about getting a page up about how fair-share works, so that we can point users at it.

From the discussion here, it looks like I was waiting for you to implement some changes, and then I was going to start composing the "why my job not run?" page. You may have already made those structural changes? On the other hand, I see that the proposed_changes branch has a bunch of commits yesterday and today.

So we should chat before I start to work on that page. I'm away at a hackathon but I'll be back on Thursday.

cc @pmenstrom

lhelms2 commented 5 months ago

@craigsteffen yes, those structural changes were made (I pinged you in Slack instead of here when I did that last month).

Yes, we should probably chat again before you start working on a new page to make sure it doesn't conflict with what I've been working on

(cross-ref issue #25 )

lhelms2 commented 5 months ago

@craigsteffen I sent you a calendar invite for Thursday, if that time doesn't work, just let me know and I'll move it.

lhelms2 commented 5 months ago

@craigsteffen I merged proposed_changes into main so you can go ahead and pull a new branch (please don't start in proposed_changes) and make the changes to common/slurm/... that you want. I'll stay out of the Slurm pages until they're ready for review to avoid any merging conflicts. (We can still meet on Thursday if you want, or just decline the invite.)

craigsteffen commented 5 months ago

@lhelms2 Right. I wouldn't start dropping changes directly into proposed changes (unless they were to concealed files like changed to .txt or whatever) for a big set of changes. Yeah, I'll do my work in another branch and then we can review and merge later.

I accepted the Thursday meeting. Let's keep it, even if it's two minutes of "yep, that's what I was thinking too, ok, cool". That's valuable.