nextstrain / seasonal-flu

Scripts. config, and snakefiles for seasonal-flu nextstrain builds
44 stars 26 forks source link

Use metadata from NA segment in joined metadata when HA segment isn't available #161

Open huddlej opened 5 months ago

huddlej commented 5 months ago

Current Behavior

Our current approach to joining segment-level metadata records into isolate-level metadata records is an HA-centric one such that NA records without a matching HA do not get any metadata from the NA record in the isolate-level record.

Expected behavior

When HA records are missing, we still want to know as much as possible about the NA record including the isolate id, the collection date, etc. We will use this information in segment-level analyses such as the flu_frequencies workflow where we estimate NA-specific clade frequencies and want to use all available NA records.

Possible solution

One solution could be to update the join_metadata script to define all segment-specific columns (e.g., "passage_category" should be segment-specific) and then update the isolate-level metadata with the first set of remaining isolate-level columns that are presenting in a segment's record (e.g., date, region, country, etc.).