hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

Amber executing basic command locate vcf.gz file that AMBER uses #439

Closed gagank4911 closed 11 months ago

gagank4911 commented 11 months ago

gagan_indx@icare-pipeline:/home/moubeen.fauzul/gagan$ java -Xmx16G -cp amber-3.9.jar com.hartwig.hmftools.amber.AmberApplication -reference VSI_15_S24 -reference_bam VSI_15_S24_sentieon_match_4.1_deduped.bam -tumor 1760_FS28 -tumor_bam 1760_FS28_sentieon_match_4.1_deduped.bam -output_dir /amber_output -loci Output_RUN55_1760_FS28_1760_FS28_Variant_Call_1760_FS28_TNscope.vcf -ref_genome_version 38 07:46:24.396 INFO [main] - AMBER version: 3.9 07:46:24.398 INFO [main] - Loading vcf file Output_RUN55_1760_FS28_1760_FS28_Variant_Call_1760_FS28_TNscope.vcf 07:46:25.112 INFO [main] - loaded 16331 baf loci 07:46:25.147 INFO [main] - Processing 16331 potential sites in reference bam VSI_15_S24_sentieon_match_4.1_deduped.bam Exception in thread "main" java.lang.IllegalArgumentException: No enum constant com.hartwig.hmftools.common.amber.BaseDepthData.Base.ACT at java.base/java.lang.Enum.valueOf(Enum.java:273) at com.hartwig.hmftools.common.amber.BaseDepthData$Base.valueOf(BaseDepthData.java:11) at com.hartwig.hmftools.common.amber.BaseDepthFactory.fromAmberSite(BaseDepthFactory.java:45) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) at com.google.common.collect.CollectSpliterators$FlatMapSpliterator.lambda$forEachRemaining$1(CollectSpliterators.java:377) at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779) at com.google.common.collect.CollectSpliterators$FlatMapSpliterator.forEachRemaining(CollectSpliterators.java:373) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) at com.hartwig.hmftools.amber.AmberGermline.germlineDepth(AmberGermline.java:101) at com.hartwig.hmftools.amber.AmberGermline.(AmberGermline.java:65) at com.hartwig.hmftools.amber.AmberApplication.runNormalMode(AmberApplication.java:102) at com.hartwig.hmftools.amber.AmberApplication.run(AmberApplication.java:79) at com.hartwig.hmftools.amber.AmberApplication.main(AmberApplication.java:238) I tried running the command by following the code provided in github but getting some errors

hongwingl commented 11 months ago

The vcf.gz file that AMBER expects is a file that contains heterozygous sites in the genome. See

https://github.com/hartwigmedical/hmftools/tree/master/amber#:~:text=The%20vcf%20file,yields%207.25M%20sites.

You will need to download it from our resource files as describe in the above link and use that as amber input.

gagank4911 commented 11 months ago

Thank you for the quick reply i was not able to locate the vcf.gz file could you share the exact location where the files are located

hongwingl commented 11 months ago

If you download this file https://storage.googleapis.com/hmf-public/HMFtools-Resources/dna_pipeline/v5_32/38/hmf_dna_pipeline_resources.38_v5.32.gz

Uncompress using tar -xzvf hmf_dna_pipeline_resources.38_v5.32.gz

Under the copy_number folder, you will find a file called GermlineHetPon.38.vcf.gz. This is the file you will need. Hope this helps.

gagank4911 commented 11 months ago

thank you

gagank4911 commented 11 months ago

:~/gagan/test_folder_amber$ java -Xmx16G -cp amber-3.9.jar com.hartwig.hmftools.amber.AmberApplication -reference COLO829R -reference_bam COLO829R.bam -tumor COLO829T -tumor_bam COLO829T.bam -output_dir /amber_output/ -ref_genome_version 38 -threads 8 -loci GermlineHetPon.38.vcf.gz 08:59:48.183 INFO [main] - AMBER version: 3.9 08:59:48.186 INFO [main] - Loading vcf file GermlineHetPon.38.vcf.gz 09:00:01.573 INFO [main] - loaded 7249216 baf loci 09:00:02.164 INFO [main] - Processing 7249216 potential sites in reference bam COLO829R.bam 09:00:03.671 INFO [main] - 7249216 loci, 145066 genome regions, min gap = 2000 Exception in thread "worker-0" Exception in thread "worker-1" Exception in thread "worker-2" Exception in thread "worker-3" Exception in thread "worker-4" Exception in thread "worker-5" Exception in thread "worker-6" Exception in thread "worker-7" 09:00:05.304 INFO [main] - 8 bam reader threads started java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) 09:00:05.415 INFO [main] - 8 bam reader threads finished 09:00:07.017 INFO [main] - Median normal depth is 0 reads: filtering reads outside of 0 and 0 09:00:07.550 INFO [main] - 0 heterozygous, 0 homozygous in reference bams 09:00:07.956 INFO [main] - Median normal depth is 0 reads: filtering reads outside of 0 and 0 09:00:08.083 INFO [main] - Processing 0 germline heterozygous loci in tumor bam COLO829T.bam 09:00:08.083 INFO [main] - Processing 0 germline homozygous loci in tumor bam COLO829T.bam for contamination 09:00:08.087 INFO [main] - 0 loci, 0 genome regions, min gap = 2000 09:00:08.116 INFO [main] - 8 bam reader threads started 09:00:08.116 INFO [main] - 8 bam reader threads finished 09:00:08.146 INFO [main] - Median tumor depth at potential contamination sites is 0 reads 09:00:08.173 INFO [main] - No evidence of contamination. Exception in thread "main" java.nio.file.NoSuchFileException: /amber_output/COLO829T.amber.qc at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478) at java.base/java.nio.file.Files.newOutputStream(Files.java:220) at java.base/java.nio.file.Files.write(Files.java:3565) at java.base/java.nio.file.Files.write(Files.java:3616) at com.hartwig.hmftools.common.amber.qc.AmberQCFile.write(AmberQCFile.java:36) at com.hartwig.hmftools.amber.AmberPersistence.persistQC(AmberPersistence.java:57) at com.hartwig.hmftools.amber.AmberApplication.runNormalMode(AmberApplication.java:112) at com.hartwig.hmftools.amber.AmberApplication.run(AmberApplication.java:79) at com.hartwig.hmftools.amber.AmberApplication.main(AmberApplication.java:238)

hongwingl commented 11 months ago

The COLO829T.bam is HG37, so it would not work with HG38 GermlineHetPon.38.vcf.gz

gagank4911 commented 11 months ago

java -Xmx16G -cp amber-3.9.jar com.hartwig.hmftools.amber.AmberApplication -reference COLO829R -reference_bam COLO829R.bam -tumor COLO829T -tumor_bam COLO829T.bam -output_dir amber_output/ -ref_genome_v ersion 38 -threads 8 -loci GermlineHetPon.37.vcf.gz 11:06:53.227 INFO [main] - AMBER version: 3.9 11:06:53.229 INFO [main] - Loading vcf file GermlineHetPon.37.vcf.gz 11:06:55.262 INFO [main] - loaded 1344545 baf loci 11:06:55.301 INFO [main] - Processing 1344545 potential sites in reference bam COLO829R.bam 11:06:56.001 INFO [main] - 1344545 loci, 148821 genome regions, min gap = 4000 11:06:56.079 INFO [main] - 8 bam reader threads started 11:06:57.063 INFO [main] - [####################] 100% complete 11:06:57.064 INFO [main] - 8 bam reader threads finished 11:06:57.315 INFO [main] - Median normal depth is 34 reads: filtering reads outside of 17 and 51 11:06:57.544 INFO [main] - 95 heterozygous, 119 homozygous in reference bams 11:06:57.646 INFO [main] - Median normal depth is 34 reads: filtering reads outside of 17 and 51 11:06:57.699 INFO [main] - Processing 95 germline heterozygous loci in tumor bam COLO829T.bam 11:06:57.700 INFO [main] - Processing 119 germline homozygous loci in tumor bam COLO829T.bam for contamination 11:06:57.704 INFO [main] - 214 loci, 75 genome regions, min gap = 2000 11:06:57.741 INFO [main] - 8 bam reader threads started 11:06:57.809 INFO [main] - [####################] 100% complete 11:06:57.845 INFO [main] - 8 bam reader threads finished 11:06:57.852 INFO [main] - Median tumor depth at potential contamination sites is 138 reads 11:06:57.857 INFO [main] - No evidence of contamination. 11:06:57.871 INFO [main] - Writing 3 contamination records to amber_output//COLO829T.amber.contamination.vcf.gz 11:06:58.116 INFO [main] - Writing 1114 germline snp records to amber_output//COLO829R.amber.snp.vcf.gz 11:06:58.306 INFO [main] - Applying pcf segmentation 11:06:58.321 INFO [main] - Executing R script via command: Rscript /tmp/script17528295923280437331.R amber_output//COLO829T.amber.baf.tsv.gz amber_output//COLO829T.amber.baf.pcf 11:06:58.833 FATAL [main] - Error executing R script. Examine error file /tmp/bafSegmentation.R9978346601299744466.error for details. Exception in thread "main" java.io.IOException: R execution failed. Unable to complete segmentation. at com.hartwig.hmftools.amber.BAFSegmentation.applySegmentation(BAFSegmentation.java:23) at com.hartwig.hmftools.amber.AmberPersistence.persistBAF(AmberPersistence.java:46) at com.hartwig.hmftools.amber.AmberApplication.runNormalMode(AmberApplication.java:116) at com.hartwig.hmftools.amber.AmberApplication.run(AmberApplication.java:79) at com.hartwig.hmftools.amber.AmberApplication.main(AmberApplication.java:238)

hongwingl commented 11 months ago

Did you install R and its dependencies? See https://github.com/hartwigmedical/hmftools/tree/master/amber#:~:text=The%20Bioconductor%20copynumber%20package%20is%20required%20for%20segmentation.%20After%20installing%20R%20or%20RStudio%2C%20the%20copy%20number%20package%20can%20be%20added%20with%20the%20following%20R%20commands%3A

gagank4911 commented 11 months ago

i have used this to install if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("copynumber") inside in R Console

hongwingl commented 11 months ago

Can you show me the content of this file? /tmp/bafSegmentation.R9978346601299744466.error

gagank4911 commented 11 months ago

i had an version issue so dplyr , S4Vectors , IRanges and GenomicRanges i had to install again using " install.packages("dplyr") BiocManager::install("S4Vectors") and BiocManager::install("IRanges") BiocManager::install("GenomicRanges") " and for copy number i had to clone the copynumber using "git clone https://git.bioconductor.org/packages/copynumber" and by using this command "install.packages("C:/path to folder with the package copynumber", repos = NULL, type = "source")" i installed copy number and it worked THANK YOU