prominence-eosc / prominence

PROMINENCE server
Apache License 2.0
2 stars 0 forks source link

Why does HTCondor sometimes not start jobs? #26

Open alahiff opened 5 years ago

alahiff commented 5 years ago

startds were created and are fine, but the jobs are not matched. Encountered while doing scale tests with 40 jobs submitted at the same time.

alahiff commented 5 years ago

Possibly related to many jobs being in the same auto cluster. The negotiator gives up checking for matches.

Adding the following to the HTCondor configuration will hopefully resolve the problem:

ADD_SIGNIFICANT_ATTRIBUTES = ProminenceJobUniqueIdentifier
alahiff commented 5 years ago

Doesn't help :-( The. negotiator seems to stop considering some jobs at all, and just considers jobs that have no startds yet before giving up. Why?

alahiff commented 4 years ago

Haven't seen this for a long time. If it happens again we can consider just bypassing the negotiator, see https://htcondor.readthedocs.io/en/latest/apis/python-bindings/advanced_tutorials/advanced_schedd.html#negotiation-with-the-schedd