ktmeaton / ncov-recombinant

Reproducible workflow for SARS-CoV-2 recombinant sequence detection.
MIT License
18 stars 2 forks source link

Upgrade sc2rf virus_properties 2022-09-27 #162

Closed ktmeaton closed 2 years ago

ktmeaton commented 2 years ago

Before updating (XD):

python3 sc2rf/sc2rf.py results/XD/nextclade/alignment.fasta \
  --clades 21I 21J 21K 21L \
  --ansi \
  --parents 2-10 \
  --breakpoints 0-10 \
  --unique 1 \
  --max-intermission-length 3 \
  --max-intermission-count 3 \
  --ignore-shared  \
  --mutation-threshold 0.25

image

After updating (XD):

python3 sc2rf/sc2rf.py results/XD/nextclade/alignment.fasta \
  --clades 21J 21K \
  --ansi \
  --parents 0-10 \
  --breakpoints 0-10 \
  --unique 0 \
  --max-intermission-length 2 \
  --max-intermission-count 20 \
  --ignore-shared  \
  --mutation-threshold 0.25

image

ktmeaton commented 2 years ago

Ah! Something is wrong with sc2rf/virus_properties.json after updating:

        {
            "NextstrainClade": "21I",
            "PangoLineage": "",
            "Letter": "\u03b4",
            "WhoLabel": "Delta",
            "Other": "",
            "WhoClass": "VOC",
            "Query": "",
            "mutations": [],
            "name": "Delta / 21I"
        },
        {
            "NextstrainClade": "21J",
            "PangoLineage": "",
            "Letter": "\u03b4",
            "WhoLabel": "Delta",
            "Other": "",
            "WhoClass": "VOC",
            "Query": "",
            "mutations": [],
            "name": "Delta / 21J"
        },

The clades that are affected are:

ktmeaton commented 2 years ago

This query no longer works:

https://lapis.cov-spectrum.org/open/v1/sample/nuc-mutations?nextstrainClade=21J%20(Delta)

But this query does!

https://lapis.cov-spectrum.org/open/v1/sample/nuc-mutations?nextstrainClade=21J

This is because WHO labels were removed in cov-spectrum Issue #546! Aha!

ktmeaton commented 2 years ago

I think I'll version/date control virus_properties.json like this:

echo "Update:" $(date +'%Y-%m-%d') > virus_properties.log
python3 sc2rf.py --rebuild-examples 2>&1 1>> virus_properties.log
ktmeaton commented 2 years ago

When updating, we lose XAN, because the breakpoint uncertainty is +10000:

image

Is it possible to relax/remove the breakpoint length filter? max_breakpoint_len