radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

Job efficiency on Cheyenne #87

Closed Weiming-Hu closed 5 years ago

Weiming-Hu commented 5 years ago

Hi, I have a question about how EnTK is managing tasks.

For a specific job submission below where I requested for 3600 cores (100 nodes) for 120 minutes in the resource script, I have the following record in Cheyenne allocation monitoring system:

Username Job Id Job Name Queue Name Submit Time Start Time End Time # Nodes Adjusted Core Hours
wuh20 4844372 pilot.0000 regular 2019-03-29 09:11:54 2019-03-29 09:12:16 2019-03-29 09:39:04 100 1,608.00

Does the allocation start counting as soon as I start EnTK?

The reason why asking this is that actually for this specific job, my program hasn't started yet because of a configuration error which showed up later the process. It shows that the job ran for about 20 minutes, which I remember (if I'm not mistaken) this time was spent by EnTK submitting and transitioning tasks. Because there are about 400 tasks in a stage and EnTK seems to process them one by one and therefore it seems to take some time (20 minutes).

During these 20 minutes, were all 3600 cores just waiting? Because it looks like it from the figures ( 09:39 - 09:12 = 27 minutes; 27/60 * 100 = 1620 ~ 1608).

This specific job caught my attention because the sys admin reach out to me and notified me the inefficient job run. Below is the info I received:

JobID UserName Job Start_Time Job_End_Time #Nod #CPU #CPU-hr #Wall-hr #Node-hr %Eff
4844372 wuh20 19/03/29 09:12:16 19/03/29 09:39:04 100 3600 0.00 0.45 44.67 0.0%

Thank you very much!

andre-merzky commented 5 years ago

AFAICS, this is indeed how EnTK is supposed to function: the resources are requested immediately.

It shows that the job ran for about 20 minutes, which I remember (if I'm not mistaken) this time was spent by EnTK submitting and transitioning tasks.

What exactly was EnTK doing in that time, do you know?

Weiming-Hu commented 5 years ago

I don't know what EnTK is doing exactly. But based on my recollection, it is managing tasks from 0 - 599, like submission and transition.

andre-merzky commented 5 years ago

Can you please provide the log files?

@vivek-bala : Do you have any idea what EnTK could spend 20 min on?

Weiming-Hu commented 5 years ago

I don't have it saved. I need to resubmit jobs. Let me update this when I get the log files. Thank you.

andre-merzky commented 5 years ago

Thank you.

vivek-bala commented 5 years ago

No, not really. We do bulk submission of the tasks already. Hard to say without the profiles.

mturilli commented 5 years ago

Blocked by #88