Closed warthmann closed 7 months ago
Hi @warthmann,
I understand what you are trying to do. Looking at the dysgu code, it should be jointly calling the 3 control samples, but the name in the vcf output will be given as the first sample in the the read group list, or the file name. There might also be a warning in the dysgu output stating more than one sample name in the read groups. The relevant code is below:
if "RG" in infile.header:
rg = infile.header["RG"]
if "SM" in rg[0]:
if len(rg) > 1:
sms = set([i["SM"] for i in rg])
if len(sms) > 1:
logging.warning("Warning: more than one sample in @RG, using first sample (SM) for output: {}".format(rg[0]["SM"]))
sample_name = rg[0]["SM"]
else:
sample_name = os.path.splitext(os.path.basename(args["sv_aligns"]))[0]
logging.warning("Warning: no SM tag in @RG (read-group), using input file name as sample name for output: {}".format(sample_name))
else:
sample_name = os.path.splitext(os.path.basename(args["sv_aligns"]))[0]
logging.warning("Warning: no @RG, using input file name as sample name for output: {}".format(sample_name))
Another way to check to see what dysgu has analysed - if you keep the temp files (no --clean
flag) and have a look at the .dysgu_reads.bam file (testing a few megabases of your input should be sufficient)
alright! great! indeed, the warning was there:
[WARNING] Warning: more than one sample in @ RG, using first sample (SM) for output: xxxx
Also, the .dysgu_reads.bam
file in the ./temp directory does contain read mappings from all 3 samples and they kept their respective RGs.
samtools samples temp.dysgu_reads.bam
A feature request hence will be to introduce an option of specifying a sample name for the joint sample and possibly even edit the RG flags.
Qnother clarification thing. I am now doing
dysgu filter -p8 treatment-svs.vcf control-temp/control-temp.dysgu_reads.bam > treatment-vs-con.vcf
I am getting the same warning. with con-2_S1 being one of the control samples in the bam file
[WARNING] Warning: more than one sample in @ RG, using first sample (SM) for output: con-2_S1
Is dysgu using all alignments in the control-temp.dysgu_reads.bam file for assessment/filtering or only the ones with RG con-2_S1?
It will be using all the reads.
Dear Kez, I am trying to pipe
samtools view
output todysgu run
, and I am uncertain of the behaviour to expect. It would be nice if you could clarify. I have a multi-sample bam file; 6 samples, each sample with a unique RG (Read Group). Three (3) samples in the bam file are control replicates (3 different RGs) and 3 samples are treatment replicates (another 3 RGs). For SV calling I would like to compare the treatment replicates jointly to the control replicates: 1) calling SVs jointly in all controls 2) calling SVs jointly in all treatment samples, 3) running dysgu filter with treatment vs control.With the below I am piping all 3 control samples to dysgu, however, I am finding that the resulting svs.vcf contains only one sample labeled with the first RG in the list (sample-names-control). Is this now a joint sample only labeled with the first RG or has dysgu only analysed the first sample. Any help clarifying and/or advice in how to perform the desired operation in another way would be nice.