Closed Weiming-Hu closed 5 years ago
AFAICS, this is indeed how EnTK is supposed to function: the resources are requested immediately.
It shows that the job ran for about 20 minutes, which I remember (if I'm not mistaken) this time was spent by EnTK submitting and transitioning tasks.
What exactly was EnTK doing in that time, do you know?
I don't know what EnTK is doing exactly. But based on my recollection, it is managing tasks from 0 - 599, like submission and transition.
Can you please provide the log files?
@vivek-bala : Do you have any idea what EnTK could spend 20 min on?
I don't have it saved. I need to resubmit jobs. Let me update this when I get the log files. Thank you.
Thank you.
No, not really. We do bulk submission of the tasks already. Hard to say without the profiles.
Blocked by #88
Hi, I have a question about how EnTK is managing tasks.
For a specific job submission below where I requested for 3600 cores (100 nodes) for 120 minutes in the resource script, I have the following record in Cheyenne allocation monitoring system:
Does the allocation start counting as soon as I start EnTK?
The reason why asking this is that actually for this specific job, my program hasn't started yet because of a configuration error which showed up later the process. It shows that the job ran for about 20 minutes, which I remember (if I'm not mistaken) this time was spent by EnTK submitting and transitioning tasks. Because there are about 400 tasks in a stage and EnTK seems to process them one by one and therefore it seems to take some time (20 minutes).
During these 20 minutes, were all 3600 cores just waiting? Because it looks like it from the figures ( 09:39 - 09:12 = 27 minutes; 27/60 * 100 = 1620 ~ 1608).
This specific job caught my attention because the sys admin reach out to me and notified me the inefficient job run. Below is the info I received:
Thank you very much!