GISAID XT recombinant not detected by sc2rf

BenjaminDelisle commented 2 years ago

Hi, I've noticed that sc2rf.py (version sc2rf-7427d2f94b69c965362034c2597b643c5dfaa1cf) could not find any recombination for XT samples available on GISAID python sc2rf.py nextclade.aligned_XT_Gisaid.fasta. Here are the available aligned sequences. nextclade.aligned_XT_Gisaid.txt

Nextclade: sc2rf:

Thanks for looking into this and other lineages that might be in the same situation.

ktmeaton commented 2 years ago

Hi @BenjaminDelisle,

I think for XT, the flag --unique 1 is required, because there is only 1 SNP contributed by a BA.1 parent (A26530G). For added confidence, the flag --enable-deletions is helpful. This way, we can confirm that the 3' end of the genome is from BA.1, as it lacks the S2M deletion (29734:29759) which defines BA.2 (and it's descendants BA.3, BA.4, BA.5).

With these parameters, I get a breakpoint interval of 26061:26529 for XT, which is very close to https://github.com/cov-lineages/pango-designation/issues/478 (26062:26528).

git clone https://github.com/lenaschimmel/sc2rf.git
cd sc2rf
git checkout 7427d2f94b69c96536
python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt --unique 1 --enable-deletions

BenjaminDelisle commented 2 years ago

Thanks for finding this @ktmeaton ! This precludes a straitghtforward implementation of sc2rf in our pipeline. We will have to reflect on this

ktmeaton commented 2 years ago

This precludes a straightforward implementation of sc2rf in our pipeline.

The recombinants with very few alleles contributed by a donor (ex. XP, XT) are extremely difficult to detect systematically without introducing a large number of false positives :( Not that I recommend this... but I've got some gnarly post-processing in my fork of sc2rf. It's an example of one way of tackling this problem, but not a rigorously tested solution.

Clone the fork

git clone https://github.com/ktmeaton/sc2rf.git sc2rf-ktmeaton
cd sc2rf-ktmeaton

Install post-processing dependencies.
```
pip install pandas click
```

Run sc2rf with highly-sensitive parameters.

python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt \
  --parents 2-4 \
  --breakpoints 0-4 \
  --unique 1 \
  --max-ambiguous 20 \
  --max-intermission-length 3 \
  --max-intermission-count 3 \
  --csvfile XT.csv \
  --ignore-shared

Post-process the CSV

python3 postprocess.py --csv XT.csv --prefix XT

Post-processed table is XT.tsv

strain	sc2rf_parents	sc2rf_regions	sc2rf_breakpoints	sc2rf_num_breakpoints	sc2rf_regions_length
hCoV-19/South_Africa/NICD-N33091/2021	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N33825/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N33849/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N35577/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N36231/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N37349/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NCV1024/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N37519/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N37608/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-N37626/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980
hCoV-19/South_Africa/NICD-CRDM03060/2022	Omicron/BA.2,Omicron/BA.1	670:26060\|Omicron/BA.2,26530:29510\|Omicron/BA.1	26061:26529	1	25390,2980

BenjaminDelisle commented 2 years ago

Thanks @ktmeaton ! Will have a look into this shortly.

lenaschimmel / sc2rf

GISAID XT recombinant not detected by sc2rf #31