lenaschimmel / sc2rf

SARS-Cov-2 Recombinant Finder for fasta sequences
MIT License
48 stars 13 forks source link

Bug/question with --force-all-parents --clades all #35

Open MarieLataretu opened 1 year ago

MarieLataretu commented 1 year ago

Hi there,

I just was wondering why I have no output and tried the second example from here: https://github.com/lenaschimmel/sc2rf#no-output--some-sequences-not-shown

So I added --clades all --force-all-parent to my call, but it seems that they can't be used both:

The number of allowed parents, the number of selected clades, and the --force-all-parents conflict so that the results must be empty.

Also, --clades all can't be used as the last argument (before the input) because the input won't be recognized

Input sequences must be provided, except when rebuilding the examples. Use --help for more info. Program exits.

I'm not sure if this is only my setup/input problem.


Would you suggest to use -c all or -f? My full command is

  python3 sc2rf.py --csvfile ../${name}_sc2rf.csv --parents 1-35 --breakpoints 1-2 \
                      --max-intermission-count 3 --max-intermission-length 1 \
                      --unique 1 --max-ambiguous 10000 --max-name-length 55 \
                      ### --clades all  --force-all-parents  \ ###
                      ../${fasta}

Best Marie

corneliusroemer commented 1 year ago

I'm sorry I can't help directly but maybe @ktmeaton can? She's the most knowledgeable person about sc2rf as far as I'm aware :)

ktmeaton commented 1 year ago

Hi Marie,

Here's my understanding of the problematic parameters.

With these arguments, --parents 1-35 conflicts with --clades all which includes 36 clades. My simple fix is to set --parents to an extremely high number (ex. --parents 1-1000). The following command and example data should not generate the warning about conflicting arguments.

Example data of 6 recombinants in Genbank: alignment.fasta.gz (gunzip first)

python3 sc2rf.py alignment.fasta \
    --csvfile tutorial.csv \
    --breakpoints 1-2 \
    --max-intermission-count 3 \
    --max-intermission-length 1 \
    --unique 1 \
    --max-ambiguous 10000 \
    --max-name-length 55 \
    --clades all \
    --force-all-parents \
    --parents 1-1000

However, with these arguments, no recombination will be detected either. This is because BA.4 and BA.5 really complicated things. From my understanding, there are very diagnostic mutations that are exclusively found in BA.2 and not BA.4 or BA.5 (and few diagnostic mutations found in BA.5, but not BA.2 or BA.4, etc.). From my experience, BA.2, BA.4, and BA.5 cannot all be included as potential parents at the same time, one of them has to be dropped. So the following debugging parameters shuold work for the example data:

python3 sc2rf.py alignment.fasta \
    --csvfile tutorial.csv \
    --breakpoints 0-10 \
    --max-intermission-count 3 \
    --max-intermission-length 1 \
    --unique 0 \
    --max-ambiguous 10000 \
    --max-name-length 55 \
    --force-all-parents \
    --parents 1-1000 \
    --clades BA.1 BA.2 BA.5 21J
MarieLataretu commented 1 year ago

Thanks for your advice, @ktmeaton ! I'll check what fits best with our current usage.

The other problem was/is that input as a positional argument won't be recognized after any argument that accepts a list. It's somewhat clear; I just expected the readme example to work ☺️