Open BenjaminDelisle opened 2 years ago
Hi @BenjaminDelisle,
I think for XT, the flag --unique 1
is required, because there is only 1 SNP contributed by a BA.1 parent (A26530G). For added confidence, the flag --enable-deletions
is helpful. This way, we can confirm that the 3' end of the genome is from BA.1, as it lacks the S2M deletion (29734:29759) which defines BA.2 (and it's descendants BA.3, BA.4, BA.5).
With these parameters, I get a breakpoint interval of 26061:26529 for XT, which is very close to https://github.com/cov-lineages/pango-designation/issues/478 (26062:26528).
git clone https://github.com/lenaschimmel/sc2rf.git
cd sc2rf
git checkout 7427d2f94b69c96536
python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt --unique 1 --enable-deletions
Thanks for finding this @ktmeaton ! This precludes a straitghtforward implementation of sc2rf in our pipeline. We will have to reflect on this
This precludes a straightforward implementation of sc2rf in our pipeline.
The recombinants with very few alleles contributed by a donor (ex. XP, XT) are extremely difficult to detect systematically without introducing a large number of false positives :( Not that I recommend this... but I've got some gnarly post-processing in my fork of sc2rf. It's an example of one way of tackling this problem, but not a rigorously tested solution.
Clone the fork
git clone https://github.com/ktmeaton/sc2rf.git sc2rf-ktmeaton
cd sc2rf-ktmeaton
Install post-processing dependencies.
pip install pandas click
Run sc2rf with highly-sensitive parameters.
python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt \
--parents 2-4 \
--breakpoints 0-4 \
--unique 1 \
--max-ambiguous 20 \
--max-intermission-length 3 \
--max-intermission-count 3 \
--csvfile XT.csv \
--ignore-shared
Post-process the CSV
python3 postprocess.py --csv XT.csv --prefix XT
Post-processed table is XT.tsv
strain | sc2rf_parents | sc2rf_regions | sc2rf_breakpoints | sc2rf_num_breakpoints | sc2rf_regions_length |
---|---|---|---|---|---|
hCoV-19/South_Africa/NICD-N33091/2021 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N33825/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N33849/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N35577/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N36231/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N37349/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NCV1024/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N37519/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N37608/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-N37626/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
hCoV-19/South_Africa/NICD-CRDM03060/2022 | Omicron/BA.2,Omicron/BA.1 | 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 | 26061:26529 | 1 | 25390,2980 |
Thanks @ktmeaton ! Will have a look into this shortly.
Hi, I've noticed that sc2rf.py (version sc2rf-7427d2f94b69c965362034c2597b643c5dfaa1cf) could not find any recombination for XT samples available on GISAID
python sc2rf.py nextclade.aligned_XT_Gisaid.fasta
. Here are the available aligned sequences. nextclade.aligned_XT_Gisaid.txtNextclade: sc2rf:
Thanks for looking into this and other lineages that might be in the same situation.