google / slo-generator

SLO Generator computes SLIs, SLOs, Error Budgets and Burn Rates from supported backends, then exports an SLO report to supported targets.
Apache License 2.0
489 stars 78 forks source link

🐛 [BUG] - Unit tests for cloud Monitoring backend fail on master outside UTC systems #308

Closed mveroone closed 1 year ago

mveroone commented 1 year ago

SLO Generator Version

Latest master branch

Python Version

3.7.7

What happened?

When following the contributor guide and running unit tests before doing any code change, I have a fail, at least running it locally.

What did you expect?

100% of tests success

Screenshots

No response

Relevant log output

___________________________________________________ TestCloudMonitoringMqlBackend.test_enrich_query_with_time_horizon_and_period ___________________________________________________

self = <test_cloud_monitoring_mql.TestCloudMonitoringMqlBackend testMethod=test_enrich_query_with_time_horizon_and_period>

        def test_enrich_query_with_time_horizon_and_period(self):
            timestamp: float = 1666995015.5144777  # = 2022/10/28 22:10:15.5144777
            window: int = 3600  # in seconds
            query: str = """fetch gae_app
    | metric 'appengine.googleapis.com/http/server/response_count'
    | filter resource.project_id == 'slo-generator-demo'
    | filter
        metric.response_code == 429
        || metric.response_code == 200
    """

            enriched_query = """fetch gae_app
    | metric 'appengine.googleapis.com/http/server/response_count'
    | filter resource.project_id == 'slo-generator-demo'
    | filter
        metric.response_code == 429
        || metric.response_code == 200
    | group_by [] | within 3600s, d'2022/10/28 22:10:15' | every 3600s"""

>           assert (
                CloudMonitoringMqlBackend.enrich_query_with_time_horizon_and_period(
                    timestamp, window, query
                )
                == enriched_query
            )
E           AssertionError: assert 'fetch gae_ap...| every 3600s' == 'fetch gae_ap...| every 3600s'
E             Skipping 237 identical leading characters in diff, use -v to show
E             - '2022/10/28 22:10:15' | every 3600s
E             ?           ^^^^
E             + '2022/10/29 00:10:15' | every 3600s
E             ?           ^^^^

tests/unit/backends/test_cloud_monitoring_mql.py:42: AssertionError

Code of Conduct

mveroone commented 1 year ago

OK, my bad, after reading it a dozen times, now I understand what test is failing. There's a 2 hour difference here because this particular timestamp occurs during the DST time of my timezone, namely CEST, which is UTC+2.

So the "problem" here is that the test will only succeed if the system bears an UTC localetime.

To make it agnostic on the local time, I suggest this change (I'm in the process of signing the CLA but haven't yet, otherwise I'd have made a PR) :

index 419e20f..ccd79ae 100644
--- a/slo_generator/backends/cloud_monitoring_mql.py
+++ b/slo_generator/backends/cloud_monitoring_mql.py
@@ -20,7 +20,7 @@ import pprint
 import typing
 import warnings
 from collections import OrderedDict
-from datetime import datetime
+from datetime import timezone, datetime
 from typing import List, Optional, Tuple

 from google.api.distribution_pb2 import Distribution
@@ -240,7 +240,7 @@ class CloudMonitoringMqlBackend:
         # epoch, in UTC, with decimal part representing nanoseconds.
         # MQL expects dates formatted like "%Y/%m/%d %H:%M:%S" or "%Y/%m/%d-%H:%M:%S".
         # Reference: https://cloud.google.com/monitoring/mql/reference#lexical-elements
-        end_time_str: str = datetime.fromtimestamp(timestamp).strftime(
+        end_time_str: str = datetime.fromtimestamp(timestamp).astimezone(tz=timezone.utc).strftime(
             "%Y/%m/%d %H:%M:%S"
         )
         query_with_time_horizon_and_period: str = (
lvaylet commented 1 year ago

Hi @mveroone,

Thanks a lot for raising this issue!

You are absolutely right. Tests should not depend on the geographical location of the contributors or the time zone configured on their machine.

Your change makes perfect sense, and is a nice fit for a first PR.

Please let me know if I can help with the CLA, then feel free to open the PR.

Laurent

mveroone commented 1 year ago

Thanks @lvaylet I'm just trying to find someone here from Legal who can confirm I'm allowed to sign that and who's the person in charge to counter-sign it. Shouldn't take more than a couple business days. Once it's signed I'll submit the PR (and I have a significantly bigger one in the works for a new backend)

Maxime

lvaylet commented 1 year ago

Sounds good to me. Test as much as possible locally, using and abusing make format, make lint and make test to detect potential problems as early as possible (= before the time-consuming CI pipeline).

I will be on vacation over the next two weeks but I will take some time to review your PRs if necessary.

lvaylet commented 1 year ago

@mveroone I will proceed and fix this bug myself as several people have reported it.

@Lawouach I am not sure I understand the differences between fromtimestamp(timestamp).astimezone(tz=timezone.utc) (Maxime's approach) and fromtimestamp(timestamp, tz=timezone.utc) (your approach). Any insight to share? I just confirmed they both fix the bug and provide the expected results.

mveroone commented 1 year ago

Sorry about this. CLA signature should be good next week, but yeah, I guess my first PR will be Splunk backend integration.

For your question, I'm not experienced enough on that to weight in on either solution. I'll let @lawouach give it a try

Lawouach commented 1 year ago

Good question. Glancing at the code, it seems there is no outcome difference between the two. Both translate to UTC and set the tz field. @lvaylet I think it's totally fine to go with whichever you feel happy with :)

lvaylet commented 1 year ago

Thanks guys. I will move forward as soon as #332 is fixed (introduced by the latest version of Pylint).