nextstrain / fauna

RethinkDB database to support real-time virus analysis
GNU Affero General Public License v3.0
33 stars 13 forks source link

Assign correct host to titers from non-ferret hosts (e.g., human and mouse) #130

Closed huddlej closed 8 months ago

huddlej commented 1 year ago

Current Behavior

We try to annotate a serum_host column in tdb on ingest of CDC titers. The serum_host column in fauna is usually null and recent “HUMAN” pooled data don’t get labeled as “human” since the annotation in the tdb ingest requires an exact match to “Human” to be added.

We also have some mouse-based data that appear with serum id values like L20MouseS0007.

Expected behavior

We should parse the host species from the serum id based on a standard set of expected values and set the default host to "ferret" in the absence of other details.

Possible solution

We need to minimally update the lines referenced above to apply a regex or other search to the serum id and set the host to "human" or "mouse" for the cases we know about.

@joverlee521 notes that: "if we update the upload scripts to not do exact matches, [we] can re-upload the titer data. Since the index fields do not include serum_host, these records should get updated in place."

Additional context

See related Slack thread.