nextstrain / forecasts-ncov

SARS-CoV-2 variant growth rates and frequency forecasts
https://nextstrain.org/sars-cov-2/forecasts/
7 stars 2 forks source link

Reduce location count thresholds #99

Closed trvrb closed 3 months ago

trvrb commented 4 months ago

This PR drops location count threshold (ie the number of sequences collected in the past 30 days) from 100 to 50 for clade-level analysis and from 300 to 150 for lineage-level analysis.

With current data this goes from 8 locations included for clades to 11 locations included.

With current data this goes from 5 locations included for lineages to 7 locations included.

To support these thresholds, I looked at location count for different countries analyzed in bedford.io/papers/abousamra-ncov-forecasting-fit/ to get specific count thresholds. We see:

I believe this suggests that a threshold of 50 sequences in previous 30 days should be roughly consistent with a ~10% forecasting error. This seems like an okay threshold for public display.

It's less certain what count threshold to use for lineages where we have significantly larger number of labels than we do for clades. Keeping a 3x ratio here for now.

recent-sequences-Trinidad and Tobago

recent-sequences-Vietnam

recent-sequences-South Africa

joverlee521 commented 4 months ago

I pushed up a small change to match the threshold numbers in the viz app.

trvrb commented 3 months ago

I pushed up a small change to match the threshold numbers in the viz app.

Thanks for the catch @joverlee521. I'm going to go ahead and merge this now.