nextstrain / oropouche

Oropouche Nextstrain build
0 stars 0 forks source link

Update ingest to accommodate oropouche segments #1

Closed miparedes closed 1 month ago

miparedes commented 1 month ago

Oropouche is a segmented virus (L, M, and S segments). In order to accommodate these different segments and to allow for downstream phylogenetic analysis, the ingest pipeline was customized to split up the metadata based on segment, as well as a metadata and sequences file with all the sequences under results/all

This was adopted from the work done by @j23414 in nextstrain/lassa#12

I really quickly compared the segment assignments done by nextclade with the already existing annotations found on NCBI, and it seems to be quite concordant with all the genomes that are annotated as L and M in NCBI being also assigned as L and M respectively by Nextclade.

There are two genomes that are annotated as S but were not assigned as such by nextclade and a quick look show that theyre both from culex mosquitos and pretty short so the sequencing quality might not be great to begin with. I can look into that a bit better in the future

Screenshot 2024-07-30 at 5 09 04 PM

there were about 13% about the genomes that didnt have a segment annotation and nextclade and nextclade was able to assign a segment to all except 7. Below is their information, they're just really short segments so makes sense that nextclade would struggle. oropouche_no_nextclade_segment_assignment.csv

It all runs perfectly thanks to @j23414 's work on the lassa side.