However in this version of the bed file, the primer names are mismatched for the following pairs:
SARS-CoV-2_3SARS-CoV-2_31SARS-CoV-2_62SARS-CoV-2_89SARS-CoV-2_96
This results in reads belonging to those pairs being erroneously filtered out as part of artic minion, and are consequently removed from the primertrimmed.rg.sorted.bam file. This results in amplicons appearing "dropped" for those regions, even though there is coverage. I have outlined this issue in more detail here: https://github.com/artic-network/fieldbioinformatics/issues/126
This means removing the sequences in the bed file as the collapse_primer_bed.py script expects 6 columns:
line 58: chrom, start, end, name, score, strand = line.strip().split("\t")
I think this would affect everyone using the Artic V5 scheme and viralrecon currently. It might be useful to use the bed file in artic-network/primer-schemes for now, and modify collapse_primer_bed.py to handle the sequence column?
Would be happy to open a PR - but think the config needs to be changed in the nf-core/configs repo to modify the download url?
Description of the bug
Hello!
By default viralrecon pulls the Artic V5 bed file from: https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bed
However in this version of the bed file, the primer names are mismatched for the following pairs:
SARS-CoV-2_3
SARS-CoV-2_31
SARS-CoV-2_62
SARS-CoV-2_89
SARS-CoV-2_96
This results in reads belonging to those pairs being erroneously filtered out as part of
artic minion
, and are consequently removed from theprimertrimmed.rg.sorted.bam
file. This results in amplicons appearing "dropped" for those regions, even though there is coverage. I have outlined this issue in more detail here: https://github.com/artic-network/fieldbioinformatics/issues/126The current workaround I have found is to manually use the bed file from
artic-network/primer-schemes
: https://github.com/artic-network/primer-schemes/blob/master/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bedThis means removing the sequences in the bed file as the
collapse_primer_bed.py
script expects 6 columns: line 58:chrom, start, end, name, score, strand = line.strip().split("\t")
I think this would affect everyone using the Artic V5 scheme and viralrecon currently. It might be useful to use the bed file in
artic-network/primer-schemes
for now, and modifycollapse_primer_bed.py
to handle the sequence column?Would be happy to open a PR - but think the config needs to be changed in the nf-core/configs repo to modify the download url?
Thanks, Sam
Command used and terminal output
No response
Relevant files
No response
System information
No response