salzberg-lab / bolotie

SARS-CoV-2: detecting recombinations in viruses using large data sets with high sequence similarity
13 stars 4 forks source link

Example of the Refrence Input #4

Closed ShadiKhoury closed 2 years ago

ShadiKhoury commented 2 years ago

Hello, I'm trying to run this repository and I just wanted to know what is the type of the reference input? ist fasta/ tsv etc ..? i would appreciate an example :)

best wishes,

Shadi

alevar commented 2 years ago

Hi Shadi,

I apologize for not having a good example within the repository. I have now added a tiny example based on the HIV reference genome. The data is available in the example_hiv directory.

Within the directory you will find the following files:

  1. ref.fa - HXB2 reference genome for HIV-1
  2. seqs.fa - mutated sequences to construct the index and test for recombination. The first 10 sequences belong to clade 1, sequence #1 is the recombinant labeled as clade #1 and the remaining 10 sequences are in clade 2. The last file is the clusters file, ie. the clades for each sequence in the seqs.fa file with the same order.

Here is a command you can use to run the example from start to finish: run.py --input ./example_hiv/seqs.fa --reference ./example_hiv/ref.fa --outdir ./res_hiv --keep_tmp --freq-threshold 3 --trim-len 100 --clusters ./example_hiv/seqs.clus

Hope this helps! Please let me know if you run into any other issues!

Best,

Ales

ShadiKhoury commented 2 years ago

thx