nextstrain / forecasts-ncov

SARS-CoV-2 variant growth rates and frequency forecasts
https://nextstrain.org/sars-cov-2/forecasts/
7 stars 2 forks source link

Force include locations #64

Open corneliusroemer opened 11 months ago

corneliusroemer commented 11 months ago

Description of proposed changes

Current include/exclude settings don't allow for countries to be force included.

This PR adds that ability in a symmetrical way to exclude.

Because we of course don't want to force include countries that have very little data, a force include is ignored if it satisfies the normal include by <10%.

I guess instead of having a simple force include, one could have a dict that allows certain countries to have a laxer include requirement: e.g. Brazil could be included if it has at least 40% of the required sequences of non-force included countries.

I played a bit with settings and these seem quite reasonable to me. Notably, we should extend the window within which min-seqs are counted as many very important countries (India, South Africa, Brazil) produce sequences with a significant delay. That doesn't make them unuseful for the analysis, I don't think we should be as strict as we currently are.

Here's how the new settings would look:

image
corneliusroemer commented 9 months ago

Can you look at this @trvrb, @joverlee521, @jameshadfield? It would be nice to have some important countries force included. Current country selection has nothing from South America nor from Africa.