grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.51k stars 290 forks source link

DST starting/ending causes overlap/gap in schedules #5247

Open zackman0010 opened 4 days ago

zackman0010 commented 4 days ago

What went wrong?

When using the API to create a schedule that has a time zone with DST, there will be a one hour gap in the schedule when DST ends and a one hour overlap when DST begins.

Image Image

How do we reproduce it?

1) Create a weekly rotating shift POST {baseUrl}/api/v1/on_call_shifts

{
    "name": "Test API Shift",
    "type": "rolling_users",
    "start": "2024-10-01T09:00:00",
    "duration": 604800,
    "frequency": "weekly",
    "week_start": "MO",
    "rolling_users": [["SomeUserId"]],
    "start_rotation_from_user_index": 0
}

2) Create a schedule using the shift created previously. A time zone that has DST must be used. POST {baseUrl}/api/v1/schedules

{
    "name": "Test API Schedule",
    "type": "calendar",
    "time_zone": "America/Chicago",
    "enable_web_overrides": true,
    "shifts": ["IdOfShiftFromStep1"]
}

3) In Grafana, look at the scheduled shifts for when DST ends (Such as November 3, 2024 for America/Chicago). You should see a one hour gap.

4) In Grafana, look at the scheduled shifts for when DST begins (Such as March 9, 2024 for America/Chicago). You should see a one hour overlap.

Grafana OnCall Version

Cloud (Plugin management says v1.14.1 is installed)

Product Area

Schedules

Grafana OnCall Platform?

I use Grafana Cloud

User's Browser?

No response

Anything else to add?

I found the PR #4103 that says it fixes schedule gaps, but it appears that it only fixed the visual gaps caused by a schedule being in UTC and the user's time zone being non-UTC. In that situation, there is no actual schedule gap because the schedule was in UTC, so the only issue was a visual bug. In this case, I believe there actually is a one hour gap.

zackman0010 commented 2 days ago

Looking into it, the best way to fix this requires updating the recurring_ical_events dependency to a newer version, as the version currently in use does not call pytz's .normalize function on the end date, causing it to be an invalid date (ie - Nov 3, 2024 9:00 AM CDT instead of the correct 8:00 AM CST). However, some of the tests are currently failing with the updated dependencies. I'm looking into it now to see if I can tell what exactly changed, but in the meantime I also have an alternate fix that doesn't involve updating the dependencies.

Fix with dependency update (Failing tests): https://github.com/zackman0010/oncall/commit/d918955c99b44846ab15a15872df2a358d296f51 Fix without dependency update: https://github.com/zackman0010/oncall/commit/19f6cc6e89930394ed4a9a84447aa8bfc1ab85df