[Feature]: Add an indication of which genotype the sample (most likely) belongs to

This request comes from Rachel Palinski, Bill Wilson, and Dana Mitzel

Summary

Place a table on the front page of The Visualizer to tell which genotype/strain each sample's consensus sequence and haplotypes BLAST toward

Added Features

Additional processes

This feature may need to be implemented in Python/R/Julia within a new process block. It should not require any new tools, however.

Additional visualizer section

seq graph.zip seq graph screenshot

The attached file contains an HTML page with a prototype of the genome table design. The table contains columns displaying

The sample name
The haplotype name
The haplotype abundance within that sample
The genotype/strain name
A link to the GenBank record for that genotype
The annotated sequence of the haplotype
- This sequence is color-coded by base, and highlights any variant positions in each haplotype sequence. It also scrolls sideways

This graph should go front-and-center on the home page of The Visualizer.

More Info

Context

Dr. Palinski wanted easy to read genome calls. It took me a while to figure out the best place to put them. Bill and Dana like pretty graphs. They couldn't really tell me what the graphs look like, so I guessed and came up with this. It is very information-dense and should please everyone.

Possible implementation

To pull this off, we will need to:

Convert all haplotype YAMLs into haplotype fastas, while maintaining frequency data
- haplotyping:HAPLINK_FASTA currently converts, while SIMULATED_READS:HAPLOTYPE_DEPTH calculates depth from single-haplotype YAML files. This will need to be rethought
Concatenate the following for all samples:
- Haplotype fastas + frequencies
- Consensus sequences
Perform alignment of each sequence to the reference genome of params.genome
- Each of these aligned sequences needs to be exactly the same length, so a multi-alignment using MAFFT might be the best option
- In the case of multi-alignment, conversion into a metadata-rich like Nexus might be useful for maintaining frequency data
Take every one of those sequences, associate it back to its sample, and print it to an HTML table
- We could make this on-the-fly in Node.js, but I think it would be far better to create the table on pipeline run, then <iframe> or include it in The Visualizer statically.

ksumngs / yavsap