reanahub / reana-job-controller

REANA Job Controller
http://reana-job-controller.readthedocs.io/
MIT License
2 stars 38 forks source link

HTCondor: job shows as running but no job sent to HTCondor #235

Closed diegodelemos closed 4 years ago

diegodelemos commented 4 years ago

Stemmed from chat conversation

Workflow with HTCondor jobs reports as running, but condor_q shows no jobs:

$ reana-client logs
....
==> Job logs
==> Step: testing
==> Workflow ID: 1ed75514-b8d5-4146-b991-bc5e05750df9
==> Compute backend: HTCondor
==> Job ID:
==> Docker image: gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:0950e980
==> Command: cd /home/cmsusr/CMSSW_10_6_8_patch1/src/ && source /cvmfs/cms.cern.ch/cmsset_default.sh && eval `scramv1 runtime -sh` && cd AnalysisCode/ZPeakAnalysis/ && mkdir $TMPDIR/output && export ANALYSIS_OUTDIR=$TMPDIR/output && cmsRun test/MyZPeak_cfg.py
==> Status: running
==> Step testing emitted no logs.

As we can see, the job ID is empty, if a condor job were created this field should be populated.

REANA-Job-Controller logs:

2020-02-19 13:49:51,466 | root | htcondor_job_monitor | INFO | Starting a new stream request to watch Condor Jobs
2020-02-19 13:49:51,466 | root | ThreadPoolExecutor-0_0 | INFO | Getting schedd: <htcondor._htcondor.Schedd object at 0x7fb5667a6e68>
2020-02-19 13:49:51,466 | root | ThreadPoolExecutor-0_0 | INFO | Querying jobs [None]
2020-02-19 13:49:51,641 | root | htcondor_job_monitor | ERROR | Job with id None was not found in schedd.
2020-02-19 13:49:51,641 | root | ThreadPoolExecutor-0_0 | INFO | Getting schedd: <htcondor._htcondor.Schedd object at 0x7fb5667a6e68>

cc @roksys I believe this was solved, can you confirm? thanks :)

tiborsimko commented 4 years ago

This had been fixed in https://github.com/reanahub/reana-job-controller/pull/228 and deployed on REANA production instance at CERN on 25 Feb 2020. Here's a test workflow finishing all fine:

htc