ksumngs / yavsap

Yet Another Viral Subspecies Analysis Pipeline
https://ksumngs.github.io/yavsap
MIT License
2 stars 1 forks source link

[Feature]: Make compatible with multi-strand viruses #25

Closed MillironX closed 2 years ago

MillironX commented 2 years ago

Summary

This request comes from Rachel Palinski and Bill Wilson

The pipeline should understand and adapt for multi-segment viruses, like Rotavirus and Rift Valley Fever Virus.

More Info

Context

The PIs on this project want to be able to analyze Bovine and Porcine Rotavirus and Rift Valley Fever Virus via this pipeline.

Alternatives

Currently, it is possible to analyze these viruses using YAVSAP. You must run the pipeline for each segment, changing the value of --genome to the closest reference of each segment. YAVSAP has changed drastically since the last time this was done, however (See #6), so it might not work anymore.

Possible implementation

This feature will require #23 to be completed.

I think the best way for the user to tell the pipeline the number of segments is via additional columns in the --genomes_list parameter. The new format might look something like

#name Segment L NCBI num Segment M Segment S
Kenya-128b-15 KX096938.1 KX096939.1 KX096940.1
SA01-1322 KX096941.1 KX096942.1 KX096943.1

The problem: when dealing with quasi-species, how can we define subconsensus genotype calls for all segments collectively? Effectively pooling the results for all segments and showing them on The Visualizer will also require more thought.