jeromekelleher / sc2ts-paper

3 stars 5 forks source link

Notes on preprocessing data for run on Viridian 0.4 #210

Open szhan opened 2 months ago

szhan commented 2 months ago

Write down some details about deduplicating sample sequences and MAFFT alignment (version and options).

szhan commented 2 months ago

Consensus sequences, which were assembled using Viridian by Hunt et al. (2024), were downloaded from Figshare (Viridian v0.4). We only used the sequences that ended up in the Viridian phylogeny built using UShER by Hunt et al.

The sequences were aligned using MAFFT v7.525 (2024/Mar/13) with the flags '--keeplength --add', as in Hunt et al., except that gaps were kept rather than being subsequently filled with reference bases. See #212.

There are some samples that have multiple replicate sequences, for example, produced using two different sequencing protocols. For these samples, one replicate sequence was chosen using a set of criteria in the metadata file. See #209.

URLs Viridian consensus sequences input to UShER Viridian_tree_cons_seqs.tar

Metadata file run_metadata.v04.tsv.gz

Citations Hunt et al. (2024) https://doi.org/10.1101%2F2024.04.29.591666 Katoh & Standley (2013) https://doi.org/10.1093%2Fmolbev%2Fmst010 Katoh et al. (2002) https://academic.oup.com/nar/article/30/14/3059/2904316