gwforg / gwf

A flexible, pragmatic workflow tool.
https://gwf.app/
GNU General Public License v3.0
31 stars 12 forks source link

Jobs from "completed" moved to "shouldrun" #413

Closed LudvigOlsen closed 3 months ago

LudvigOlsen commented 7 months ago

Last night, I ran the gwf status summary twice in a row in my workflow. I didn't change anything in-between. Jobs that were marked as completed in the first summary were somehow marked as shouldrun in the second summary.

billede

gwfss is just an alias for making the summary with some timestamps around it. So:
alias gwfss="echo '----------------------------'; echo 'gwf status'; date; gwf status -f summary; date; echo '----------------------------'".

I am not sure how to reproduce this. It seems kind of random since I didn't change any files or anything.

dansondergaard commented 3 months ago

That sounds weird. I haven't heard of any other instances of this happening. It could be many many things not related to gwf at all (the filesystem, Slurm reporting intermittent states, jobs not being completely finished, but being counted as "completed" by gwf for a short while).

Have you experienced this since you reported it?

LudvigOlsen commented 3 months ago

Hi Dan, I haven't experienced it since no. Not sure what went wrong. I see how it could be a problem in slurm and not GWF.

I have recently started using the following instead of gwf status since it returns instantly (gwf status takes 45min with my giant workflow :-) ):

alias sqs='squeue -u $USER -o "%.10i %.9P %.8j %.8u %.2t %.10M %.6D %.6R" | awk '\''
NR > 1 {
    state = $5
    if (state == "R") state = "Running"
    else if (state == "PD") state = "Pending"
    else if (state == "CG") state = "Completing"
    else if (state == "CD") state = "Completed"
    else if (state == "F") state = "Failed"
    else if (state == "TO") state = "Timeout"
    # Add more state mappings as needed
    count[state]++
}
END {
    for (status in count) print status, count[status]
}'\'
dansondergaard commented 3 months ago

I'll close this for now then :-)

(gwf status needs to ask for metadata on every single file in your workflow to determine if everything is up-to-date, while squeue just gives you the list of jobs and their status).