nextstrain / forecasts-ncov

SARS-CoV-2 variant growth rates and frequency forecasts
https://nextstrain.org/sars-cov-2/forecasts/
7 stars 2 forks source link

Country include description out of sync? #76

Open corneliusroemer opened 1 year ago

corneliusroemer commented 1 year ago

Current Behavior

The website currently states for clade frequencies:

Only locations with more than 100 sequences from samples collected in the previous 150 days are included.

image

We show the following countries:

This doesn't seem to be correct, or at least missing important context, as when I look for countries with more than 100 sequences with collection date <150 days ago on covSpectrum (https://cov-spectrum.org/explore/World/AllSamples/from%3D2023-07-02%26to%3D2023-11-22/variants/international-comparison?&) I get the following countries:

Country Total Variant Sequences First seq. found at Last seq. found at
United States 99114 2023-26 2023-47
Canada 33981 2023-26 2023-46
United Kingdom 22897 2023-26 2023-46
Japan 22771 2023-26 2023-45
South Korea 18858 2023-26 2023-45
France 17394 2023-26 2023-46
Spain 14246 2023-26 2023-46
China 13271 2023-26 2023-46
Australia 7386 2023-26 2023-46
Sweden 6758 2023-26 2023-47
Italy 5333 2023-26 2023-47
Denmark 4696 2023-27 2023-46
Singapore 4517 2023-26 2023-44
Germany 3514 2023-26 2023-46
Netherlands 3139 2023-26 2023-46
Belgium 3077 2023-26 2023-47
Brazil 2781 2023-26 2023-45
New Zealand 2668 2023-26 2023-43
Israel 2617 2023-26 2023-45
Greece 2469 2023-27 2023-40
Ireland 2343 2023-26 2023-47
Russia 1963 2023-26 2023-44
Switzerland 1916 2023-27 2023-46
Finland 1668 2023-26 2023-45
Austria 1411 2023-27 2023-46
Peru 1254 2023-26 2023-43
Luxembourg 1213 2023-27 2023-43
Portugal 1198 2023-27 2023-45
Mexico 1074 2023-26 2023-42
Croatia 858 2023-27 2023-43
Chile 787 2023-27 2023-43
Thailand 773 2023-26 2023-43
Slovenia 752 2023-26 2023-42
Iceland 676 2023-27 2023-46
Colombia 653 2023-26 2023-43
Ukraine 652 2023-27 2023-44
Taiwan 581 2023-26 2023-45
South Africa 493 2023-27 2023-41
Turkey 465 2023-28 2023-40
Poland 459 2023-28 2023-45
Norway 441 2023-26 2023-44
Romania 364 2023-27 2023-40
Argentina 359 2023-26 2023-38
Malaysia 359 2023-26 2023-43
Costa Rica 341 2023-26 2023-43
Guatemala 321 2023-27 2023-40
India 285 2023-26 2023-44
Georgia 272 2023-27 2023-40
Mauritius 270 2023-27 2023-44
Bulgaria 254 2023-27 2023-43
Dominican Republic 200 2023-27 2023-35

Expected behavior

Brazil | 2781 | 2023-26 | 2023-45 New Zealand | 2668 | 2023-26 | 2023-43 Israel | 2617 | 2023-26 | 2023-45 Greece | 2469 | 2023-27 | 2023-40 Russia | 1963 | 2023-26 | 2023-44 Austria | 1411 | 2023-27 | 2023-46 Peru | 1254 | 2023-26 | 2023-43 Luxembourg | 1213 | 2023-27 | 2023-43 Portugal | 1198 | 2023-27 | 2023-45 Mexico | 1074 | 2023-26 | 2023-42 Croatia | 858 | 2023-27 | 2023-43 Chile | 787 | 2023-27 | 2023-43 Thailand | 773 | 2023-26 | 2023-43 Slovenia | 752 | 2023-26 | 2023-42 Colombia | 653 | 2023-26 | 2023-43 Ukraine | 652 | 2023-27 | 2023-44 Taiwan | 581 | 2023-26 | 2023-45 South Africa | 493 | 2023-27 | 2023-41 Turkey | 465 | 2023-28 | 2023-40 Poland | 459 | 2023-28 | 2023-45 Norway | 441 | 2023-26 | 2023-44 Romania | 364 | 2023-27 | 2023-40 Argentina | 359 | 2023-26 | 2023-38 Malaysia | 359 | 2023-26 | 2023-43 Costa Rica | 341 | 2023-26 | 2023-43 Guatemala | 321 | 2023-27 | 2023-40 India | 285 | 2023-26 | 2023-44 Georgia | 272 | 2023-27 | 2023-40 Mauritius | 270 | 2023-27 | 2023-44 Bulgaria | 254 | 2023-27 | 2023-43 Dominican Republic | 200 | 2023-27 | 2023-35

Notably, we include Iceland with only 700 sequences but exclude Brazil with 2500

corneliusroemer commented 1 year ago

I think the text is wrong, as the config shows:

        location_min_seq: 100
        location_min_seq_days: 30

So in reality, to be included, a location needs 100 sequences within 30 days of today. Would be good to relax this I think. Recent data is not the most important criterion. Some countries just don't have recent data, that doesn't mean they shouldn't be included if they have slightly more delayed data. So I think location_min_seq_days should be increased to something like 60 days at least.

In addition, the website/html should pull the description from the config file and not hard code so that doc and code are automatically synced.

corneliusroemer commented 1 year ago

These are the force excluded countries: Austria Czech Republic Lithuania Luxembourg Slovakia

Not sure why we'd force exclude Czechia with 10m people but not force exclude Iceland with ~100-200k.