lenaschimmel / sc2rf

SARS-Cov-2 Recombinant Finder for fasta sequences
MIT License
48 stars 13 forks source link

GISAID XT recombinant not detected by sc2rf #31

Open BenjaminDelisle opened 2 years ago

BenjaminDelisle commented 2 years ago

Hi, I've noticed that sc2rf.py (version sc2rf-7427d2f94b69c965362034c2597b643c5dfaa1cf) could not find any recombination for XT samples available on GISAID python sc2rf.py nextclade.aligned_XT_Gisaid.fasta. Here are the available aligned sequences. nextclade.aligned_XT_Gisaid.txt

Nextclade: image sc2rf: image

Thanks for looking into this and other lineages that might be in the same situation.

ktmeaton commented 2 years ago

Hi @BenjaminDelisle,

I think for XT, the flag --unique 1 is required, because there is only 1 SNP contributed by a BA.1 parent (A26530G). For added confidence, the flag --enable-deletions is helpful. This way, we can confirm that the 3' end of the genome is from BA.1, as it lacks the S2M deletion (29734:29759) which defines BA.2 (and it's descendants BA.3, BA.4, BA.5).

With these parameters, I get a breakpoint interval of 26061:26529 for XT, which is very close to https://github.com/cov-lineages/pango-designation/issues/478 (26062:26528).

git clone https://github.com/lenaschimmel/sc2rf.git
cd sc2rf
git checkout 7427d2f94b69c96536
python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt --unique 1 --enable-deletions

image

BenjaminDelisle commented 2 years ago

Thanks for finding this @ktmeaton ! This precludes a straitghtforward implementation of sc2rf in our pipeline. We will have to reflect on this

ktmeaton commented 2 years ago

This precludes a straightforward implementation of sc2rf in our pipeline.

The recombinants with very few alleles contributed by a donor (ex. XP, XT) are extremely difficult to detect systematically without introducing a large number of false positives :( Not that I recommend this... but I've got some gnarly post-processing in my fork of sc2rf. It's an example of one way of tackling this problem, but not a rigorously tested solution.

  1. Clone the fork

    git clone https://github.com/ktmeaton/sc2rf.git sc2rf-ktmeaton
    cd sc2rf-ktmeaton
  2. Install post-processing dependencies.

    pip install pandas click
  3. Run sc2rf with highly-sensitive parameters.

    python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt \
      --parents 2-4 \
      --breakpoints 0-4 \
      --unique 1 \
      --max-ambiguous 20 \
      --max-intermission-length 3 \
      --max-intermission-count 3 \
      --csvfile XT.csv \
      --ignore-shared

    image

  4. Post-process the CSV

    python3 postprocess.py --csv XT.csv --prefix XT
  5. Post-processed table is XT.tsv

strain sc2rf_parents sc2rf_regions sc2rf_breakpoints sc2rf_num_breakpoints sc2rf_regions_length
hCoV-19/South_Africa/NICD-N33091/2021 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N33825/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N33849/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N35577/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N36231/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37349/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NCV1024/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37519/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37608/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37626/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-CRDM03060/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
BenjaminDelisle commented 2 years ago

Thanks @ktmeaton ! Will have a look into this shortly.