nextstrain / seasonal-flu

Scripts. config, and snakefiles for seasonal-flu nextstrain builds
44 stars 26 forks source link

Run pathogen-embed on HA/NA alignments to flag putative reassortant clades #174

Closed huddlej closed 2 weeks ago

huddlej commented 2 months ago

Context

From our work in Nanduri et al., we developed the pathogen-embed tools to project seasonal flu alignments into low-dimensional representations and identify clusters of genetically related sequences. We can use these tools to jointly embed alignments from multiple genes like HA and NA and identify putative reassortment events. The pathogen-embed package is now part of the Nextstrain Docker and Conda environments, so we can easily run these tools from our seasonal flu workflows.

Description

Add rules to the core seasonal flu workflow to annotate HA and NA trees with t-SNE embedding coordinates (tsne_x and tsne_y) using pathogen-distance and pathogen-embed and labels of clusters identified with pathogen-cluster (tsne_label). Calculate distances for each gene segment individually and produce a t-SNE embedding from all distances and alignments together using the optimal settings from Nanduri et al. Then, produce clusters using optimal settings for Nextstrain clades from the same work.