rfeng2023 / mmcloud

1 stars 10 forks source link

Batch exporting job attachment #59

Closed hsun3163 closed 6 months ago

hsun3163 commented 7 months ago

It is hard to trace back the error msg in the actual output stderr file with the jobs status it was ran. Therefore, I was wondering if there is an interface for us to download all the job attachment for each job id. The downloaded file could be simply in the hierarchy of XXX/job_id/files

The needed features is that:

  1. It needs to accommodate batch operation via a command line interface sothat we don't need to do it manually
  2. It needs to have some ways to distinguish the files from 1 jobs vs another, be it in different folders or different file names
  3. It is best if we could downloads the files based on some filtering, like job statues, user id, job name .etc.
Ashley-Tung commented 7 months ago

Hi @hsun3163 as of now, MMC does not support mass downloading the logs from the GUI. We only support downloading one file at a time from the chosen job.

For the CLI, you are able to run a float -j JOB_ID log ls to show the logs of the job ID. One option here is to write a script to parse all the job ids from float squeue or with a filtered view, then use those IDs to save the log file contents via float log cat ~in a zip folder with float log download -j JOB_ID~ CORRECTION: the download option is only available in 2.5. The Opcenter is current on a private build of 2.4.1

Ashley-Tung commented 7 months ago

Another way you can get the logs of a job (with the use of a script), is if you have access into the opcenter itself, logs of the job are stored under /mnt/memverge/slurm/work. You will see two levels of directories that correspond to the first two pairs of characters for the job. For example, if my job id is jkbzo4y7c529fiko0jius, then I will know the contents are stored under /mnt/memverge/slurm/work/jk/bz/o4y7c529fiko0jius