hodcroftlab / covariants

Real-time updates and information about key SARS-CoV-2 variants, plus the scripts that generate this information.
https://covariants.org/
GNU Affero General Public License v3.0
316 stars 112 forks source link

Move clusters to 2 week intervals (like countries) #295

Closed MoiraZuber closed 2 years ago

MoiraZuber commented 2 years ago

All usage of weekly intervals was removed from allClusterDynamics_faster.py; both "per country" and "per variant" datasets are now sorted into 2-week intervals. Additionally, replaced the to2week(x) function with to2week_ordinal(x), which does not use the isocalendar to breakdown dates to their respective biweekly intervals, but instead relates them to a "reference Monday" and simply counts the difference in days. This should avoid future New Year "how many weeks per year" issues.

TODOs:

vercel[bot] commented 2 years ago

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/hodcroftlab/covariants/G9fpmfSJYpDyAPUGKzqV4peDqEG4
✅ Preview: https://covariants-git-clusters2weekintervals-hodcroftlab.vercel.app

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
covariants ✅ Ready (Inspect) Visit Preview Jul 20, 2022 at 3:31PM (UTC)
emmahodcroft commented 2 years ago

This is a comment to remember to check - once smoothing is removed and plotting is switched to 2 weeks - how we are handling zeros and very small numbers. True zeros should '-' or not be displayed at all. Very small numbers that may be rounded to 0 (0.000988) should be shown as <0.01% or similar.

See #255 for full details, and potential changes to code if needed to accomplish this!

emmahodcroft commented 2 years ago

As an update to the message above - the decision about how to distinguish 'real 0s' (ex: 0 sequences out of 2000 are Alpha - have some confidence there is no Alpha) vs 'no data' (ex: 0/0 sequences are Alpha) should be moved elsewhere. Currently both plot as -.

This should be part of a bigger rethink about how to display Per Variants, possibly cutting down on the number of variants & countries shown by default.

Currently 'very small numbers' and 'real 0/no data' are also not distinguished, and they should be. But this should be fixed by the move to the new libraries/plotting in #319.


In response to @MoiraZuber 's list above:

  1. The to2week_ordinal(x) function currently needs to return the dates as a (year, week) tuple, as it is required in this format further downstream. Was this done that way due to the New Year issues and could now be changed back to normal dates, or is there another reason for this?
  2. Can smoothing (which was used for weekly intervals) now be dropped? (and if yes, where in the script?)
  3. What adjustments are needed from the web-side of things to accommodate 2-week intervals for clusters?
  1. I don't 100% remember why we've done this, but my fear is that this is an integral part of the code in how we index things, so if it's working for now, I'm hesitant to change it. Perhaps this can be a separate PR, or part of the refactor, if we think it's important to change? (Happy to chat about this more if needed)
  2. From my reading, I think smoothing was dropped with the code that you already deleted/replaced - if you think otherwise please shout!
  3. I think I got this all sorted out with Ivan now!