samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

Incorrect number of FORMAT/GL values with bcftools tag2tag output #2055

Closed isinaltinkaya closed 9 months ago

isinaltinkaya commented 9 months ago

Hi,

I am using version: 1.18-31-g4014f7e2 (using htslib 1.18-72-g9e1ffd85).

Commands:

${bcftools} plugin tag2tag
${bcftools} mpileup t2_files/t2_ind1.sam --fasta-ref t2_files/t2_ref.fa|${bcftools} plugin tag2tag -- --PL-to-GL | ${bcftools} call -m -O b > delme_ind1.bcf
${bcftools} mpileup t2_files/t2_ind2.sam --fasta-ref t2_files/t2_ref.fa|${bcftools} plugin tag2tag -- --PL-to-GL | ${bcftools} call -m -O b > delme_ind2.bcf
${bcftools} index delme_ind1.bcf
${bcftools} index delme_ind2.bcf
${bcftools} merge delme_ind1.bcf delme_ind2.bcf

I get this error:

Incorrect number of FORMAT/GL values at ref1:2, cannot merge. The tag is defined as Number=G, but found
3 values and 1 alleles. See also http://samtools.github.io/bcftools/howtos/FAQ.html#incorrect-nfields

It works fine when I remove the bcftools plugin tag2tag part and directly pipe to bcftools call.

Input files:

$ cat t2_files/t2_ind1.sam 
@HD VN:0    SO:coordinate
@SQ SN:ref1 LN:10
*   0   ref1    2   255 1M  *   0   0   A   5
*   0   ref1    2   255 1M  *   0   0   A   5
*   0   ref1    4   255 3M  *   0   0   ACC 555
*   0   ref1    4   255 2M  *   0   0   AC  55
*   0   ref1    4   255 1M  *   0   0   A   5
$ cat t2_files/t2_ind1.sam
@HD VN:0    SO:coordinate
@SQ SN:ref1 LN:10
*   0   ref1    2   255 1M  *   0   0   A   5
*   0   ref1    2   255 1M  *   0   0   A   5
*   0   ref1    4   255 3M  *   0   0   ACC 555
*   0   ref1    4   255 2M  *   0   0   AC  55
*   0   ref1    4   255 1M  *   0   0   A   5
*   0   ref1    4   255 1M  *   0   0   C   5
$ cat t2_files/t2_ind2.sam
@HD VN:0    SO:coordinate
@SQ SN:ref1 LN:10
*   0   ref1    2   255 1M  *   0   0   A   5
*   0   ref1    5   255 2M  *   0   0   CA  55
*   0   ref1    5   255 2M  *   0   0   AA  55
$ cat t2_files/t2_ref.fa  
>ref1
AAAACCAAAA
pd3 commented 9 months ago

Thank you for the test case.

The program call expects direct output from mpileup and unaware of GL, trims only PL fields when alleles are removed. In principle it could be made smarter, but I am not convinced it is worth it. Simply change the pipeline to

mpileup | call | tag2tag