tl-its-umich-edu / my-learning-analytics

My Learning Analytics (MyLA)
Apache License 2.0
36 stars 39 forks source link

MyLA importing incorrect timezone times #1484

Closed cosimps closed 1 year ago

cosimps commented 1 year ago

Expected behavior (A description of what you expected to happen) :

Timestamps for locally computed times, such as the grade_posted_local_date field in the submission table, to have the correctly computed time for the America/Detroit configured timezone.

Describe the bug (Tell us what happens instead of the expected behavior) :

Timestamps are showing as 4 hours ahead of UTC instead of 5 hours behind for the correct America/Detroit (EST) time.

Steps to Reproduce :

  1. Select an assignment from the MyLA database:
    SELECT id FROM assignment ORDER BY due_date DESC LIMIT 1;

    Response:

    -- response --
    +-----------------+
    | id              |
    +-----------------+
    | 420000011111111 |
    +-----------------+
  2. Select a user of the assignment:
    SELECT
        assignment_id,
        user_id,
        grade_posted_local_date
    FROM submission
    WHERE assignment_id = 420000011111111
        AND user_id = 420000002222222;

    Response:

    +-----------------+-----------------+----------------------------+
    | assignment_id   | user_id         | grade_posted_local_date    |
    +-----------------+-----------------+----------------------------+
    | 420000011111111 | 420000002222222 | 2021-10-17 20:49:11.908000 |
    +-----------------+-----------------+----------------------------+

    This provides the following information:

    • long user id: 420000002222222
    • short user id: 2222222 (dropping the 42000000 prefix)
    • assignment id: 420000011111111

The rest of the queries below are against one of the Unizin host UDP context_store.

  1. Fetch the learner_activity_result_id for the assignment:
    SELECT
        entity.learner_activity_result.learner_activity_result_id,
        keymap.learner_activity.lms_int_id,
        keymap.person.lms_ext_id
    FROM entity.learner_activity_result
    LEFT JOIN keymap.person ON keymap.person.id = entity.learner_activity_result.person_id
    LEFT JOIN keymap.learner_activity ON keymap.learner_activity.id = entity.learner_activity_result.learner_activity_id
    WHERE cast(keymap.person.lms_ext_id AS bigint) = 2222222                      -- short user id
        AND cast(keymap.learner_activity.lms_int_id AS BIGINT) = 420000011111111; -- assignment id

    Response:

    learner_activity_result_id |   lms_int_id    | lms_ext_id 
    ----------------------------+-----------------+------------
                    333333333 | 420000011111111 | 2222222
  2. Now view the posted_at field (in the way that it is queried here by MyLA):
    SELECT 
        learner_activity_result_id,
        posted_at,
        posted_at AT TIME ZONE 'America/Detroit' AS "posted_at (grade_posted_local_date)"
    FROM entity.learner_activity_result
    WHERE learner_activity_result_id = 333333333;

    This shows the wrong time being calculated (grade_posted_local_date in the MyLA query):

     learner_activity_result_id |        posted_at        | posted_at (grade_posted_local_date)        
    ----------------------------+-------------------------+------------------------------------
                      333333333 | 2021-10-17 16:49:11.908 |          2021-10-17 20:49:11.908+00

The context_store's timestamps are in UTC.

show timezone;
 TimeZone 
----------
 UTC

The America/Detroit (or EST) timezone is 5 hours behind UTC, not the 4 hours head that is returned by the queries:

select * from pg_timezone_names where name = 'America/Detroit';
      name       | abbrev | utc_offset | is_dst 
-----------------+--------+------------+--------
 America/Detroit | EST    | -05:00:00  | f
SELECT CURRENT_TIME AT TIME ZONE 'UTC', CURRENT_TIME AT TIME ZONE 'EST', CURRENT_TIME AT TIME ZONE 'America/Detroit';
      timezone      |      timezone      |      timezone      
--------------------+--------------------+--------------------
 21:58:56.198633+00 | 16:58:56.198633-05 | 16:58:56.198633-05

grade_posted_local_date should return 2021-10-17 11:49:11.908+00.

The following contains multiple ways the queries can be corrected to return to correct timestamp:

SELECT 
    posted_at,
    posted_at AT TIME ZONE 'America/Detroit' AS "posted_at (original)",                           -- incorrect
    posted_at AT TIME ZONE 'UTC' AS "posted_at (UTC)",                                            -- shows time is unaffected by converting into UTC
    posted_at AT TIME ZONE 'UTC' AT TIME ZONE 'America/Detroit' AS "posted_at (America/Detroit)", -- incorrect, original way
    posted_at AT TIME ZONE 'UTC' AT TIME ZONE 'EST' AS "posted_at (EST)",                         -- correct time
    posted_at::timestamp WITH TIME ZONE AT TIME ZONE 'EST' AS "posted_at (option 2)",             -- correct time
    posted_at::timestamptz AT TIME ZONE 'EST' AS "posted_at (option 3)"                           -- correct time
FROM entity.learner_activity_result
WHERE learner_activity_result_id = 333333333;
        posted_at        |    posted_at (original)    |      posted_at (UTC)       | posted_at (America/Detroit) |  posted_at (option 1)   |  posted_at (option 2)   |  posted_at (option 3)   
-------------------------+----------------------------+----------------------------+-----------------------------+-------------------------+-------------------------+-------------------------
 2021-10-17 16:49:11.908 | 2021-10-17 20:49:11.908+00 | 2021-10-17 16:49:11.908+00 | 2021-10-17 12:49:11.908     | 2021-10-17 11:49:11.908 | 2021-10-17 11:49:11.908 | 2021-10-17 11:49:11.908

Additional context (Add any other context about the problem here) :

Affected queries:

cosimps commented 1 year ago

grade_posted_local_date should return 2021-10-17 11:49:11.908+00.

It was also noticed that converting the time with the timezone's name (America/Detroit) the incorrect time was still being computed. It was returning a time of only 4 hours behind instead of the correct 5 hours behind. Switching to using the timezone's abbreviation instead does provide the correct value.

This is incorrect. On 2021-10-17, America/Detroit was running in EDT, not EST. It is currently in EST thus the perceived discrepancy.

Here are the updated query options to resolve this:

SELECT 
    posted_at,
    posted_at AT TIME ZONE 'America/Detroit' AS "posted_at (original)",                           -- incorrect
    posted_at AT TIME ZONE 'UTC' AS "posted_at (UTC)",                                            -- shows time is unaffected by converting into UTC
    posted_at AT TIME ZONE 'UTC' AT TIME ZONE 'America/Detroit' AS "posted_at (America/Detroit)", -- correct time
    posted_at::timestamp WITH TIME ZONE AT TIME ZONE 'America/Detroit' AS "posted_at (option 2)", -- correct time
    posted_at::timestamptz AT TIME ZONE 'America/Detroit' AS "posted_at (option 3)"               -- correct time
FROM entity.learner_activity_result
WHERE learner_activity_result_id = 333333333;

credit to @dcgleim for figuring this part out

jonespm commented 1 year ago

Thanks for the report! I've typically preferred to use a specific region like America/Detroit over EST as I believe that adjusts for DST. We also have another issue at #1432 to pass the value configure in the environment into the cron. Do you know if you configure this differently for schools in other time zones and if it would help them? Only the date is shown (not the time) so it shouldn't have a large user impact off by a few hours but it still could if they had something due at midnight. All of these options look good.

cosimps commented 1 year ago

@jonespm I put the info about our configuration in the other issue (here).

We would prefer to set the institutions' timezones for their MyLA install so that there are not discrepancies like the midnight scenario you described. I foresee that being a feature request/support issue from them down the line if we only use one standard North American timezone.

Best option would probably be to only store the datetime as UTC and translate it to the viewer's local time on the frontend or at request time, but that solution is beyond me. We are 100% good with it just being configurable for each instance of MyLA.

jonespm commented 1 year ago

Thanks, I forget why we were doing the conversion it in the cron rather than pulling down "as-is". This looks like it's been in the code since the beginning and I agree probably should be as you describe.

Maybe that's something whoever fixes these two issues to look at. I'd expect these will both be in the next release.

jonespm commented 1 year ago

I'm looking at converting to just saving the UTC time locally and doing the conversion there, trying to see if there was some reason we were doing it in the query. This might have to be a separate fix as it's going to be more work changing the migrations, the API's and the frontend.

For the immediate fix if we keep it in the query, It looks like we might have to use the timezone function, which supports SQLAlchemy parameter replacement. This returns the same results as your other examples.

timezone('America/Detroit', posted_at AT TIME ZONE 'UTC' ) AS "posted_at (timezone function)"

pushyamig commented 1 year ago

I changed the TZ value in .env to `"America/Chicago" and ran the cron, I see that assignment and submission tables where local data is present, it populated correct date based on timezone.

Test passes