Closed patidarr closed 7 years ago
BAMs must contain a Read Group (@RG) with the sample name (SN)
Ahh, I do have RG tag but my sample name is stored in as SM :( @RG ID:Sample_NCI-0040_E_C6R72ANXX PL:Illumina LB:Sample_NCI-0040_E_C6R72ANXX SM:Sample_NCI-0040_E_C6R72ANXX
Any change to accommodate this?
Sorry , typo: the Sample name is SM , not SN.
Is "Sample_NCI-0040_E_C6R72ANXX" the very same sample name in your VCF file ?
yes it is.
the trace says you used a file named 'cmd'. Am I wrong ?
[main] INFO jvarkit - Command Line args : -d 10 -f cmd
can you please show me the output of
cat cmd | xargs ls -la
Sorry typo in original command here.
$ java -jar /data/khanlab/apps/jvarkit/dist/fixvcfmissinggenotypes.jar -d 10 -f cmd
]$ cat cmd | xargs ls -la -rw-rw-r-- 1 patidarr khanlab 11479630128 Jun 21 12:15 /data/khanlab/projects/processed_DATA/NCI0082/DCEG/Sample_NCI-0082_E_C6R72ANXX/Sample_NCI-0082_E_C6R72ANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 10573464321 Jun 21 15:49 /data/khanlab/projects/processed_DATA/RMS003/DCEG/Sample_RMS003_E_C6R9WANXX/Sample_RMS003_E_C6R9WANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 9721901273 Jun 21 07:23 /data/khanlab/projects/processed_DATA/RMS004/DCEG/Sample_RMS004_E_C6R72ANXX/Sample_RMS004_E_C6R72ANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 9880370397 Jun 21 08:46 /data/khanlab/projects/processed_DATA/RMS006/DCEG/Sample_RMS006_E_C6R72ANXX_C6R9WANXX/Sample_RMS006_E_C6R72ANXX_C6R9WANXX.bwa.dd.bam -rw-rw-r-- 1 patidarr khanlab 24930278775 Jun 21 21:27 /data/khanlab/projects/processed_DATA/RMS006/DCEG/Sample_RMS006_E_C6R72ANXX_C6R9WANXX/Sample_RMS006_E_C6R72ANXX_C6R9WANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 9975663826 Jun 20 20:38 /data/khanlab/projects/processed_DATA/RMS007/DCEG/Sample_RMS007_E_C6R72ANXX/Sample_RMS007_E_C6R72ANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 10558785239 Jun 21 08:51 /data/khanlab/projects/processed_DATA/RMS008/DCEG/Sample_RMS008_E_C6R72ANXX_C6R9WANXX/Sample_RMS008_E_C6R72ANXX_C6R9WANXX.bwa.dd.bam
and here is RG tag $ samtools view -H /data/khanlab/projects/processed_DATA/NCI0082/DCEG/Sample_NCI-0082_E_C6R72ANXX/Sample_NCI-0082_E_C6R72ANXX.bwa.final.bam |grep RG @RG ID:Sample_NCI-0082_E_C6R72ANXX PL:Illumina LB:Sample_NCI-0082_E_C6R72ANXX SM:Sample_NCI-0082_E_C6R72ANXX
and version info
$ java -jar /data/khanlab/apps/jvarkit/dist/fixvcfmissinggenotypes.jar --help
Description: After a VCF-merge, read a VCF, look back at some BAMS to tells if the missing genotypes were homozygotes-ref or not-called. If the number of reads is greater than min.depth, then the missing genotypes is said hom-ref.
Author : Pierre Lindenbaum PhD. Mail : plindenbaum@yahoo.fr WWW : https://github.com/lindenb/jvarkit/wiki/FixVcfMissingGenotypes Compilation : 2016-12-01:08-12-00 Git-Hash : 31949a5be3c9948eb6d6fa72a96e8cbcbc66796d Htsjdk-version : 2.6.1 Htsjdk-home : lib/com/github/samtools/htsjdk/2.6.1/htsjdk-2.6.1.jar
just an idea, can you please rename your file cmd
to cmd.list
and test again ?
You got it. cmd.list did the trick. would be useful to add in on the wiki page :)
Thanks a ton.
Hi Pierre,
Is there a specific requirement in naming the bam files when using fixvcfmissinggenotypes ? Here the log and command I ran
$ java -jar /apps/jvarkit/dist/fixvcfmissinggenotypes.jar -d 10 -f listout.vcf
[main] INFO jvarkit - Starting JOB at Thu Dec 01 11:11:27 EST 2016 com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes version=31949a5be3c9948eb6d6fa72a96e8cbcbc66796d built=2016-12-01:08-12-00
[main] INFO jvarkit - Command Line args : -d 10 -f cmd
[main] INFO jvarkit - Executing as patidarr@cn2698 on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_11-b12
[main] INFO jvarkit - Reading header for cmd
[main] INFO jvarkit - Adding 'java.io.tmpdir' directory to the list of tmp directories
[main] INFO jvarkit - Sample: Sample_NCI-0040_E_C6R72ANXX
[main] WARN jvarkit - No bam to fix sample Sample_NCI-0040_E_C6R72ANXX
[main] INFO jvarkit - done: N=11025
[main] INFO jvarkit - done sample Sample_NCI-0040_E_C6R72ANXX fixed=0 not-fixed=0 total=11025 genotypes
[main] INFO jvarkit - Sample: Sample_NCI-0082_E_C6R72ANXX
[main] WARN jvarkit - No bam to fix sample Sample_NCI-0082_E_C6R72ANXX
[main] INFO jvarkit - done: N=11025
my list file contains the bams paths with names like: Sample_NCI-0040_E_C6R72ANXX.bam /data/khanlab/projects/processed_DATA/NCI0082/DCEG/Sample_NCI-0082_E_C6R72ANXX/Sample_NCI-0082_E_C6R72ANXX.bwa.final.bam
Could you please let me know what am I missing here?
Thanks, Rajesh