populationgenomics / metamist

Sample level metadata system
MIT License
1 stars 1 forks source link

When updating Analysis record only set timestamp_completed value if it's not already set. #872

Closed milo-hyben closed 1 month ago

milo-hyben commented 1 month ago

This PR is fixing the issue with timestamp_completed being updated even if it has been set prior. This is causing issues with seqr aggregation as it can not find any completed cram types prior to May 18, 2022 With more details reported in the close PR: https://github.com/populationgenomics/metamist/pull/871

There are 5614 records with wrong timestamp_completed.

There are 3 steps to fix this issue:

  1. Apply this PR
  2. Backup the records to be fixed with this sql:
CREATE TABLE analysis_backup_prior_timestamp_completed_update AS
SELECT analysis.*
FROM analysis
INNER JOIN (
SELECT id, MIN(timestamp_completed) as timestamp_completed 
FROM analysis FOR SYSTEM_TIME ALL 
WHERE timestamp_completed IS NOT NULL
GROUP BY id
) as s
on s.id = analysis.id AND analysis.timestamp_completed <> s.timestamp_completed;
  1. Update existing records with the sql:
CREATE TABLE tmp_analysis_new_timestamp_completed AS
SELECT s.id, s.timestamp_completed as new_timestamp_completed
FROM analysis a
INNER JOIN (
SELECT id, MIN(timestamp_completed) as timestamp_completed 
FROM analysis FOR SYSTEM_TIME ALL 
WHERE timestamp_completed IS NOT NULL
GROUP BY id
) as s
on s.id = a.id AND a.timestamp_completed <> s.timestamp_completed;

UPDATE analysis
INNER JOIN tmp_analysis_new_timestamp_completed s
on s.id = analysis.id AND analysis.timestamp_completed <> s.new_timestamp_completed
SET analysis.timestamp_completed = s.new_timestamp_completed
WHERE analysis.id in (SELECT id from tmp_analysis_new_timestamp_completed);
codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 80.67%. Comparing base (68e57d1) to head (59e5322). Report is 1 commits behind head on dev.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #872 +/- ## ========================================== + Coverage 80.49% 80.67% +0.17% ========================================== Files 172 172 Lines 14540 14551 +11 ========================================== + Hits 11704 11739 +35 + Misses 2836 2812 -24 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.