Closed xiadawei123 closed 6 months ago
The sample names are inferred from the information contained in the BAM files, specifically read groups (see the @RG
header lines and SM
subfield in the SAM specification https://samtools.github.io/hts-specs/SAMv1.pdf).
One sample can be contained in multiple BAMs and one BAM can have multiple samples, and the program attempts to match them all correctly by the read group and SM name. The behavior can be fine tuned with the bcftools mpileup -G
option, see the description in the manual page http://samtools.github.io/bcftools/bcftools.html#mpileup
*-G, --read-groups* [^]'FILE'::
list of read groups to include or exclude if prefixed with "^".
One read group per line. This file can also be used to assign new sample
names to read groups by giving the new sample name as a second
white-space-separated field, like this: "read_group_id new_sample_name".
If the read group name is not unique, also the bam file name can
be included: "read_group_id file_name sample_name". If all
reads from the alignment file should be treated as a single sample, the
asterisk symbol can be used: "* file_name sample_name". Alignments without
a read group ID can be matched with "?". *NOTE:* The meaning of *bcftools mpileup -G*
is the opposite of *samtools mpileup -G*.
----
RG_ID_1
RG_ID_2 SAMPLE_A
RG_ID_3 SAMPLE_A
RG_ID_4 SAMPLE_B
RG_ID_5 FILE_1.bam SAMPLE_A
RG_ID_6 FILE_2.bam SAMPLE_A
* FILE_3.bam SAMPLE_C
? FILE_3.bam SAMPLE_D
----
Hi, I am using bcftools for snp calling, but why do 12 bam files show only one sample file entered,as shown below.
$ bcftools mpileup -a AD,ADF,ADR,DP,SP,INFO/AD,INFO/ADF,INFO/ADR -b specieal.txt -f ../../Bins_modified.fasta -q 20 -Q 20 -O u --threads 20 | bcftools call -c -v -O z --threads 20 -o chloroplast.vcf.gz Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid [mpileup] 1 samples in 12 input files [mpileup] maximum number of reads per input file set to -d 250