neherlab / covid19_scenarios

Models of COVID-19 outbreak trajectories and hospital demand
https://covid19-scenarios.org
MIT License
1.36k stars 354 forks source link

🇧🇷 Brazil case data is incorrect #718

Closed ivan-aksamentov closed 4 years ago

ivan-aksamentov commented 4 years ago

@pauloangelo reported the inconsistencies in data for Brazil

The counts for "BRA-Distrito Federal" are including the cases from other regions detected at Distrito Federal. The Brasil.io dataset registers external cases as "Importados/Indefinidos". I suggest to count just the local cases. For example, for 29-May-2020 there are 142 local deaths, while the TSV counts 154.

Source: https://github.com/neherlab/covid19_scenarios/issues/69#issuecomment-636389668

For the hospital capacities, unfortunately, we don't have a reliable data. The government are varying this information. For the population sizes, R0, etc, I believe we can provide, at least for "BRA-Distrito Federal". Follows below the link/data that we have been using in our weekly report.

Weekly reports created by our observatory (parameters are also motivated here) https://www.prepidemia.org/boletins-quinzenais-prepidemia .

Source: https://github.com/neherlab/covid19_scenarios/issues/69#issuecomment-636390682

ivan-aksamentov commented 4 years ago

@pauloangelo Could you help us to better understand what is wrong and how to proceed? Is data on brazil.io incorrect or do we not parse it correctly (or both) ?

These are the coulms that we receive from their CSV file:

date,state,city,place_type,confirmed,deaths,is_last,estimated_population_2019,city_ibge_code,confirmed_per_100k_inhabitants,death_rate

The parser currently just takes data, confirmed and deaths as is. This is no imported column. @rneher How do we deal with cases detected in one region, but imported from another? To which region we register them?

@pauloangelo Would you suggest another data source where import situation is handled better? We also don't have any ICU case data ("icu" column in the TSV file). By any chance you've seen this information available somewhere?

Hospital capacity is ultimately only for reference in the plot, it does not influence the algorithm. However, the ICU capacity is important. In our model, the overflow of the ICU sharply increases mortality. If nation-wide ICU capacity is available, then we might scale it down by region population sizes to get the per-region capacities.

If you have time resources, feel free to investigate and submit a draft pull request. We can always discuss here.

Some additional links:

pauloangelo commented 4 years ago

Hi @ivan-aksamentov ,

The imported or undefined counts are in rows where city is "Importados/Indefinidos". This is the counts of cases registered in that region but related to individuals of other regions, or undefined cases. The SEIR model should not consider such cases because these individuals probably got infected outside the region. Thus, it is not reasonable to count them in the region's population.

I did some changes in the "parsers/brazil.py" to remove the imported/undefined counts (below). The code may not be good, but can be used for reference.

    regions = defaultdict(list)
    fd  = io.StringIO(r.text)
    rdr = csv.reader(fd)
    hdr = next(rdr)

    # Added block
    regions_external = {}
    for row in rdr:
        state = '-'.join(['BRA',state_codes[row[1]]])
        city = row[2]
        if city != "Importados/Indefinidos": continue
        date = row[0]
        cases = stoi(row[4])
        deaths = stoi(row[5])
        if state not in regions_external:
            regions_external[state] = {}
        regions_external[state][date] = [cases, deaths, None, None, None]

    fd  = io.StringIO(r.text)
    rdr = csv.reader(fd)
    hdr = next(rdr)

    for row in rdr:
        state = '-'.join(['BRA',state_codes[row[1]]])
        city = row[2]
        if city != "": continue
        date = row[0]
        cases = stoi(row[4])
        deaths = stoi(row[5])

        # remove the imported/undefined counts
        if state in regions_external and date in regions_external[state]:
            regions[state].append([date, cases-regions_external[state][date][0], deaths-regions_external[state][date][1], None, None, None])
        else:
            regions[state].append([date, cases, deaths, None, None, None])

    for state, data in regions.items():
        regions[state] = sorted_date(data, cols)
    store_data(regions, 'brazil',  cols)
pauloangelo commented 4 years ago

@ivan-aksamentov

Currently, Brasil.io is the best initiative that I know. I'm in contact with them to check if it is possible to include the ICU data into the dataset.

ivan-aksamentov commented 4 years ago

@pauloangelo Thanks! I submitted the code as https://github.com/neherlab/covid19_scenarios/pull/721 Please review the correctness and the resulting .tsv files.

pauloangelo commented 4 years ago

Hi @ivan-aksamentov ,

I will check the TSV and provide a return. Follow the answers for the other questions.

Do you know if these imported cases are accounted in any of the regions? => I believe that the imported cases are not accounted in other regions.

How do we make sure we don't lose any cases? => In total, we may lose cases. However, in most cases this lose are negligible. In Distrito Federal (DF) the imported cases are a bit relevant, because we have a relatively small population and a relatively better medical infrastructure. DF is surrounded by interior cities of other regions. So, many patients from these cities come to DF for medical assistance and some of them die. Such deaths (imported/undefined) represents ~7% of the total deaths in DF. In the other hand, these cases should not be considered in the SEIR model because they were not infected here. We also may have infected in DF and accounted in other regions, but these cases should be much smaller.

Does import here means import from abroad or is this just moving between regions of Brazil? => Import means that the cases were infected outside the region and detected in the region. Most of them are from other regions of Brazil, but we may have from another countries as well.

ivan-aksamentov commented 4 years ago

@pauloangelo Thanks Paulo. Every country and region has its specifics. This is very interesting. Looking forward to your input and further suggestions!

ivan-aksamentov commented 4 years ago

Resolved in #721