oar-team / oar3

OAR: versatile resource and job manager for cluster (third generation)
Other
8 stars 11 forks source link

How does a user watch the Karma ? #65

Open bzizou opened 1 month ago

bzizou commented 1 month ago

With OAR2, users could see the karma associated to the scheduling of a job with oarstat, for example:

26199695  R jovanovn    0:20:40 R=2,W=1:0:0,J=B,P=epimed,T=heterogeneous|verysmall (Karma=-0.016,quota_ok)
26199698  R collaog     0:18:08 R=1,W=24:0:0,J=B,N=AN40LGMcBn5078,P=elmerice,T=heterogeneous|veryverysmall (Karma=0.005,quota_ok)

The Karma is a part of the string contained into the message field of the jobs table. With OAR3, it seems that the message does not contains any Karma information, and anyway, the message is not printed by oarstat.

How does the user know about the Karma with OAR3?

augu5te commented 1 month ago

If fairshare scheduling is enable, Karma value is normally added in job_message function: https://github.com/oar-team/oar3/blob/c6c84f5093def5670fa6e4478dec4c7c5c0fdb26/oar/lib/job_handling.py#L630

So fixing the printing issue of oarstat should be sufficient.

bzizou commented 1 month ago

Well, this leads to another question: actualy, fairsharing was not enabled into my configuration, because the JOB_PRIORITY configuration variable is missing from the default oar.conf file I think...

augu5te commented 1 month ago

Sure fairsharing is disabled by default (JOB_PRIORITY="FIFO") and oar.conf needs some care and polish

bzizou commented 1 month ago

Actually, oarstat, in the default mode, prints the message, but the Karma appears in the message at some times, but not once the job is running...

root@dahu-oar3:~# oarstat 

  Job id   State     User           Duration   System message                                    Queue    
 ──────────────────────────────────────────────────────────────────────────────────────────────────────── 
  19128    Running   marets         0:01:07    R=32,J=I,Q=default                                default  
  19129    Running   bzizou         0:00:04    R=1,W=1800,J=I,Q=default (Karma=0.0)              default  

A few seconds later:

root@dahu-oar3:~# oarstat 

  Job id   State     User     Duration   System message       Queue    
 ───────────────────────────────────────────────────────────────────── 
  19128    Running   marets   0:01:37    R=32,J=I,Q=default   default  
  19129    Running   bzizou   0:00:34    R=1,J=I,Q=default    default