Closed ivan-aksamentov closed 4 years ago
@pauloangelo Could you help us to better understand what is wrong and how to proceed? Is data on brazil.io incorrect or do we not parse it correctly (or both) ?
These are the coulms that we receive from their CSV file:
date,state,city,place_type,confirmed,deaths,is_last,estimated_population_2019,city_ibge_code,confirmed_per_100k_inhabitants,death_rate
The parser currently just takes data, confirmed and deaths as is. This is no imported column. @rneher How do we deal with cases detected in one region, but imported from another? To which region we register them?
@pauloangelo Would you suggest another data source where import situation is handled better? We also don't have any ICU case data ("icu" column in the TSV file). By any chance you've seen this information available somewhere?
Hospital capacity is ultimately only for reference in the plot, it does not influence the algorithm. However, the ICU capacity is important. In our model, the overflow of the ICU sharply increases mortality. If nation-wide ICU capacity is available, then we might scale it down by region population sizes to get the per-region capacities.
If you have time resources, feel free to investigate and submit a draft pull request. We can always discuss here.
Some additional links:
Our parser for Brazil.io: https://github.com/neherlab/covid19_scenarios/blob/67cc45f942/data/parsers/brazil.py
Our current output of the parser in TSV format: https://github.com/neherlab/covid19_scenarios/tree/67cc45f942/data/case-counts/brazil
Our data Guide: https://github.com/neherlab/covid19_scenarios/blob/master/data/README.md
Hi @ivan-aksamentov ,
The imported or undefined counts are in rows where city is "Importados/Indefinidos". This is the counts of cases registered in that region but related to individuals of other regions, or undefined cases. The SEIR model should not consider such cases because these individuals probably got infected outside the region. Thus, it is not reasonable to count them in the region's population.
I did some changes in the "parsers/brazil.py" to remove the imported/undefined counts (below). The code may not be good, but can be used for reference.
regions = defaultdict(list)
fd = io.StringIO(r.text)
rdr = csv.reader(fd)
hdr = next(rdr)
# Added block
regions_external = {}
for row in rdr:
state = '-'.join(['BRA',state_codes[row[1]]])
city = row[2]
if city != "Importados/Indefinidos": continue
date = row[0]
cases = stoi(row[4])
deaths = stoi(row[5])
if state not in regions_external:
regions_external[state] = {}
regions_external[state][date] = [cases, deaths, None, None, None]
fd = io.StringIO(r.text)
rdr = csv.reader(fd)
hdr = next(rdr)
for row in rdr:
state = '-'.join(['BRA',state_codes[row[1]]])
city = row[2]
if city != "": continue
date = row[0]
cases = stoi(row[4])
deaths = stoi(row[5])
# remove the imported/undefined counts
if state in regions_external and date in regions_external[state]:
regions[state].append([date, cases-regions_external[state][date][0], deaths-regions_external[state][date][1], None, None, None])
else:
regions[state].append([date, cases, deaths, None, None, None])
for state, data in regions.items():
regions[state] = sorted_date(data, cols)
store_data(regions, 'brazil', cols)
@ivan-aksamentov
Currently, Brasil.io is the best initiative that I know. I'm in contact with them to check if it is possible to include the ICU data into the dataset.
@pauloangelo Thanks! I submitted the code as https://github.com/neherlab/covid19_scenarios/pull/721 Please review the correctness and the resulting .tsv files.
Hi @ivan-aksamentov ,
I will check the TSV and provide a return. Follow the answers for the other questions.
Do you know if these imported cases are accounted in any of the regions? => I believe that the imported cases are not accounted in other regions.
How do we make sure we don't lose any cases? => In total, we may lose cases. However, in most cases this lose are negligible. In Distrito Federal (DF) the imported cases are a bit relevant, because we have a relatively small population and a relatively better medical infrastructure. DF is surrounded by interior cities of other regions. So, many patients from these cities come to DF for medical assistance and some of them die. Such deaths (imported/undefined) represents ~7% of the total deaths in DF. In the other hand, these cases should not be considered in the SEIR model because they were not infected here. We also may have infected in DF and accounted in other regions, but these cases should be much smaller.
Does import here means import from abroad or is this just moving between regions of Brazil? => Import means that the cases were infected outside the region and detected in the region. Most of them are from other regions of Brazil, but we may have from another countries as well.
@pauloangelo Thanks Paulo. Every country and region has its specifics. This is very interesting. Looking forward to your input and further suggestions!
Resolved in #721
@pauloangelo reported the inconsistencies in data for Brazil
Source: https://github.com/neherlab/covid19_scenarios/issues/69#issuecomment-636389668
Source: https://github.com/neherlab/covid19_scenarios/issues/69#issuecomment-636390682