perfsonar / pscheduler

The perfSONAR Scheduler
Apache License 2.0
53 stars 32 forks source link

Scheduler non-starts runs for some long tasks #1412

Closed mfeit-internet2 closed 3 months ago

mfeit-internet2 commented 3 months ago

@cs1867 observed non-starts for 24-hour latencybg when a schedule is specified.

This appears to be an edge case in the scheduler. Slip should never be reduced to negative. Also need to see why an identical proposed run range isn't treated as common.

Relevant parts of mesh config:

{
    "groups": {
    "group_mesh_wash": {
        "type": "mesh",
        "addresses": [
                { "name": "foo" },
                { "name": "bar" }
            ]
    }
    },
    "tests": {
        "latency_test": {
            "type": "latencybg",
            "spec": {
            "source": "{% address[0] %}",
            "dest": "{% address[1] %}"
            }
    }
    },
    "schedules": {
        "schedule_P1D": {
            "repeat": "P1D",
            "sliprand": false,
            "slip": "PT3M"
        }
    },
    "tasks": {
        "latencybg_wash": {
            "group": "group_mesh_wash",
            "test": "latency_test",
            "schedule": "schedule_P1D",
            "archives": [ "http_archive" ],
            "reference": {
                "display-task-name": "Wash Loss Tests"
            }
        }
    }

}

Relevant bit of task as posted by pSConfig:

    "schedule": {
    "repeat": "P1D",
        "slip": "PT3M",
        "sliprand": false,
        "until": "2024-03-13T19:58:45Z"
    },
    "schema": 1,
    "test": {
        "spec": {
            "dest": "140.208.255.240",
        "schema": 1,
        "source": "137.75.71.160"
    },
    "type": "latencybg"
    },
    "tool": "powstream"

Log from scheduler:

scheduler DEBUG    scheduler-pool-3: 3: Started worker runner
scheduler DEBUG    3: Participant list is ['xxx']
scheduler DEBUG    3: Binding from None
scheduler DEBUG    3: All participants are up.
scheduler DEBUG    3: Task URLs are ['https://xxx/pscheduler/tasks/45ed0f1b-41aa-4902-bfb3-9f333968dcbb']

vvvvv  This is the interesting part:
scheduler DEBUG    3: Chopped slip to -1 day, 23:59:58
^^^^^

scheduler DEBUG    3: Possible run range {'start': '2024-03-11T19:58:55Z', 'end': '2024-03-12T19:58:55Z'}
scheduler DEBUG    3: Trying to schedule with priority None
scheduler DEBUG    3: Fetching proposals from https://xxx/pscheduler/tasks/45ed0f1b-41aa-4902-bfb3-9f333968dcbb/runtimes
scheduler DEBUG    3: Ranges: [R(2024-03-11 19:58:55+00:00..2024-03-12 19:58:55+00:00)]
scheduler DEBUG    3: Done fetching time ranges

vvvvv Also interesting because the one proposed range matches the possible range exactly.
scheduler DEBUG    3: Ranges in common: []
^^^^^
scheduler DEBUG    3: Unable: No times available for this run.
scheduler INFO     3: Posting non-starting run at 2024-03-11T19:58:50Z for task 45ed0f1b-41aa-4902-bfb3-9f333968dcbb: No times available for this run.
scheduler DEBUG    3: Thread finished