nextstrain / fauna

RethinkDB database to support real-time virus analysis
GNU Affero General Public License v3.0
33 stars 13 forks source link

Counts in HI strain files need to be properly summed #62

Closed trvrb closed 7 years ago

trvrb commented 7 years ago

The --all option in download_all.py is much appreciated. However, it needs to be smarter about how hi_strains.tsv files are combined. Here,

https://github.com/nextstrain/fauna/blob/master/download_all.py#L64

we need to combine the HI strain tsvs in a smarter fashion. Each individual tsv looks like:

A/Pakistan/431/2015 5
A/Mexico/4159/2016  6
A/Kazakhstan/646/2016   5
...
A/Pakistan/431/2015 6

The combined all_hi_strains.tsv needs to have

A/Pakistan/431/2015 11

The titers tsv files can be concatenated just as they are now.