lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

No bam to fix a sample in fixvcfmissinggenotypes.jar #68

Closed patidarr closed 7 years ago

patidarr commented 7 years ago

Hi Pierre,

Is there a specific requirement in naming the bam files when using fixvcfmissinggenotypes ? Here the log and command I ran

$ java -jar /apps/jvarkit/dist/fixvcfmissinggenotypes.jar -d 10 -f list out.vcf [main] INFO jvarkit - Starting JOB at Thu Dec 01 11:11:27 EST 2016 com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes version=31949a5be3c9948eb6d6fa72a96e8cbcbc66796d built=2016-12-01:08-12-00 [main] INFO jvarkit - Command Line args : -d 10 -f cmd [main] INFO jvarkit - Executing as patidarr@cn2698 on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_11-b12 [main] INFO jvarkit - Reading header for cmd [main] INFO jvarkit - Adding 'java.io.tmpdir' directory to the list of tmp directories [main] INFO jvarkit - Sample: Sample_NCI-0040_E_C6R72ANXX [main] WARN jvarkit - No bam to fix sample Sample_NCI-0040_E_C6R72ANXX [main] INFO jvarkit - done: N=11025 [main] INFO jvarkit - done sample Sample_NCI-0040_E_C6R72ANXX fixed=0 not-fixed=0 total=11025 genotypes [main] INFO jvarkit - Sample: Sample_NCI-0082_E_C6R72ANXX [main] WARN jvarkit - No bam to fix sample Sample_NCI-0082_E_C6R72ANXX [main] INFO jvarkit - done: N=11025

my list file contains the bams paths with names like: Sample_NCI-0040_E_C6R72ANXX.bam /data/khanlab/projects/processed_DATA/NCI0082/DCEG/Sample_NCI-0082_E_C6R72ANXX/Sample_NCI-0082_E_C6R72ANXX.bwa.final.bam

Could you please let me know what am I missing here?

Thanks, Rajesh

lindenb commented 7 years ago

BAMs must contain a Read Group (@RG) with the sample name (SN)

patidarr commented 7 years ago

Ahh, I do have RG tag but my sample name is stored in as SM :( @RG ID:Sample_NCI-0040_E_C6R72ANXX PL:Illumina LB:Sample_NCI-0040_E_C6R72ANXX SM:Sample_NCI-0040_E_C6R72ANXX

Any change to accommodate this?

lindenb commented 7 years ago

Sorry , typo: the Sample name is SM , not SN.

Is "Sample_NCI-0040_E_C6R72ANXX" the very same sample name in your VCF file ?

patidarr commented 7 years ago

yes it is.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_NCI-0040_E_C6R72ANXX Sample_NCI-0082_E_C6R72ANXX

lindenb commented 7 years ago

the trace says you used a file named 'cmd'. Am I wrong ?

[main] INFO jvarkit - Command Line args : -d 10 -f cmd

can you please show me the output of

cat cmd | xargs ls -la
patidarr commented 7 years ago

Sorry typo in original command here.

$ java -jar /data/khanlab/apps/jvarkit/dist/fixvcfmissinggenotypes.jar -d 10 -f cmd out.vcf [main] INFO jvarkit - Starting JOB at Thu Dec 01 12:32:20 EST 2016 com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes version=31949a5be3c9948eb6d6fa72a96e8cbcbc66796d built=2016-12-01:08-12-00 [main] INFO jvarkit - Command Line args : -d 10 -f cmd [main] INFO jvarkit - Executing as patidarr@cn2698 on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_11-b12 [main] INFO jvarkit - Reading header for cmd [main] INFO jvarkit - Adding 'java.io.tmpdir' directory to the list of tmp directories [main] INFO jvarkit - Sample: Sample_NCI-0040_E_C6R72ANXX [main] WARN jvarkit - No bam to fix sample Sample_NCI-0040_E_C6R72ANXX [main] INFO jvarkit - done: N=11025 [main] INFO jvarkit - done sample Sample_NCI-0040_E_C6R72ANXX fixed=0 not-fixed=0 total=11025 genotypes [main] INFO jvarkit - Sample: Sample_NCI-0082_E_C6R72ANXX [main] WARN jvarkit - No bam to fix sample Sample_NCI-0082_E_C6R72ANXX [main] INFO jvarkit - done: N=11025

]$ cat cmd | xargs ls -la -rw-rw-r-- 1 patidarr khanlab 11479630128 Jun 21 12:15 /data/khanlab/projects/processed_DATA/NCI0082/DCEG/Sample_NCI-0082_E_C6R72ANXX/Sample_NCI-0082_E_C6R72ANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 10573464321 Jun 21 15:49 /data/khanlab/projects/processed_DATA/RMS003/DCEG/Sample_RMS003_E_C6R9WANXX/Sample_RMS003_E_C6R9WANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 9721901273 Jun 21 07:23 /data/khanlab/projects/processed_DATA/RMS004/DCEG/Sample_RMS004_E_C6R72ANXX/Sample_RMS004_E_C6R72ANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 9880370397 Jun 21 08:46 /data/khanlab/projects/processed_DATA/RMS006/DCEG/Sample_RMS006_E_C6R72ANXX_C6R9WANXX/Sample_RMS006_E_C6R72ANXX_C6R9WANXX.bwa.dd.bam -rw-rw-r-- 1 patidarr khanlab 24930278775 Jun 21 21:27 /data/khanlab/projects/processed_DATA/RMS006/DCEG/Sample_RMS006_E_C6R72ANXX_C6R9WANXX/Sample_RMS006_E_C6R72ANXX_C6R9WANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 9975663826 Jun 20 20:38 /data/khanlab/projects/processed_DATA/RMS007/DCEG/Sample_RMS007_E_C6R72ANXX/Sample_RMS007_E_C6R72ANXX.bwa.final.bam -rw-rw-r-- 1 patidarr khanlab 10558785239 Jun 21 08:51 /data/khanlab/projects/processed_DATA/RMS008/DCEG/Sample_RMS008_E_C6R72ANXX_C6R9WANXX/Sample_RMS008_E_C6R72ANXX_C6R9WANXX.bwa.dd.bam

patidarr commented 7 years ago

and here is RG tag $ samtools view -H /data/khanlab/projects/processed_DATA/NCI0082/DCEG/Sample_NCI-0082_E_C6R72ANXX/Sample_NCI-0082_E_C6R72ANXX.bwa.final.bam |grep RG @RG ID:Sample_NCI-0082_E_C6R72ANXX PL:Illumina LB:Sample_NCI-0082_E_C6R72ANXX SM:Sample_NCI-0082_E_C6R72ANXX

and version info

$ java -jar /data/khanlab/apps/jvarkit/dist/fixvcfmissinggenotypes.jar --help

FixVcfMissingGenotypes

Description: After a VCF-merge, read a VCF, look back at some BAMS to tells if the missing genotypes were homozygotes-ref or not-called. If the number of reads is greater than min.depth, then the missing genotypes is said hom-ref.

Author : Pierre Lindenbaum PhD. Mail : plindenbaum@yahoo.fr WWW : https://github.com/lindenb/jvarkit/wiki/FixVcfMissingGenotypes Compilation : 2016-12-01:08-12-00 Git-Hash : 31949a5be3c9948eb6d6fa72a96e8cbcbc66796d Htsjdk-version : 2.6.1 Htsjdk-home : lib/com/github/samtools/htsjdk/2.6.1/htsjdk-2.6.1.jar

lindenb commented 7 years ago

just an idea, can you please rename your file cmd to cmd.list and test again ?

patidarr commented 7 years ago

You got it. cmd.list did the trick. would be useful to add in on the wiki page :)

Thanks a ton.