hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
188 stars 58 forks source link

AMBER Implementaion #331

Closed ZWael closed 1 year ago

ZWael commented 1 year ago

Hello I am trying to use AMBER ans I'am not familiar to the java language, I think i have truble with parallelisation and workers here are my steps

1) I created a conda env and installed maven 2) download the amber-3.9.jar (https://github.com/hartwigmedical/hmftools/releases/download/amber-v3.9/amber-3.9.jar) in my local machine 3) download the test files COLO829R.bam and COLO829T.bam from the repo github.com/hartwigmedical/hmftools/tree/master/pipeline/test_data/ 4) download the GermlineHetPon.38.vcf.gz from the nextcloud folder FHMFTools-Resources>Amber>38 j’exécute la commande :

java -Xmx32G -cp AMBER/amber-3.9.jar com.hartwig.hmftools.amber.AmberApplication -reference COLO829R -reference_bam test/COLO829R.bam -tumor COLO829T -tumor_bam test/COLO829T.bam -output_dir out -loci Resources/GermlineHetPon.38.vcf.gz -ref_genome_version 38 the output message was

13:33:05.454 INFO [main] - AMBER version: 3.9 13:33:05.457 INFO [main] - Loading vcf file Resources/GermlineHetPon.38.vcf.gz 13:33:10.131 INFO [main] - loaded 1337724 baf loci 13:33:10.193 INFO [main] - Processing 1337724 potential sites in reference bam test/COLO829R.bam 13:33:12.436 INFO [main] - 1337724 loci, 148424 genome regions, min gap = 4000 13:33:12.476 INFO [main] - 1 bam reader threads started Exception in thread "worker-0" java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) 13:33:12.483 INFO [main] - 1 bam reader threads finished 13:33:12.984 INFO [main] - Median normal depth is 0 reads: filtering reads outside of 0 and 0 13:33:13.336 INFO [main] - 0 heterozygous, 0 homozygous in reference bams 13:33:13.485 INFO [main] - Median normal depth is 0 reads: filtering reads outside of 0 and 0 13:33:13.527 INFO [main] - Processing 0 germline heterozygous loci in tumor bam test/COLO829T.bam 13:33:13.528 INFO [main] - Processing 0 germline homozygous loci in tumor bam test/COLO829T.bam for contamination 13:33:13.531 INFO [main] - 0 loci, 0 genome regions, min gap = 2000 13:33:13.537 INFO [main] - 1 bam reader threads started 13:33:13.538 INFO [main] - 1 bam reader threads finished 13:33:13.546 INFO [main] - Median tumor depth at potential contamination sites is 0 reads 13:33:13.554 INFO [main] - No evidence of contamination. 13:33:13.581 INFO [main] - Writing 0 contamination records to out/COLO829T.amber.contamination.vcf.gz 13:33:13.690 INFO [main] - Writing 1101 germline snp records to out/COLO829R.amber.snp.vcf.gz 13:33:14.010 INFO [main] - Applying pcf segmentation 13:33:14.020 INFO [main] - Executing R script via command: Rscript /tmp/script2398773644644405398.R out/COLO829T.amber.baf.tsv.gz out/COLO829T.amber.baf.pcf

toddajohnson commented 1 year ago

If you run samtools view -H COLO829R.bam > check.sam against one of those test bam files, you will see in the output header that the reference genome used for mapping is "/opt/resources/reference_genome/37/Homo_sapiens.GRCh37.GATK.illumina.fasta"; so, that test data was mapped to build 37 instead of 38, but you are telling amber to use 38 and using build 38 GermlineHetPon. Although I presume that your own data will be mapped will be mapped to the more recent GRCh38, to practice with this test data, you will need to download the build 37 pipeline reference data. The reference used by the test data is mentioned on "https://github.com/hartwigmedical/hmftools/blob/master/pipeline/README_WGS.md" as being GRCh37 under the "Test Data" section. I guess that HMF has lots of people use their resources bandwidth just to download "old" reference data to test the pipeline... :-)

ZWael commented 1 year ago

Hello @toddajohnson Thank you for the reply As you suggested I downloaded the GermlineHetPon.37.vcf.gz file , and changed the command accordingly java -Xmx32G -cp AMBER/amber-3.9.jar com.hartwig.hmftools.amber.AmberApplication -reference COLO829R -reference_bam test/COLO829R.bam -tumor COLO829T -tumor_bam test/COLO829T.bam -output_dir out -loci Resources/GermlineHetPon.37.vcf.gz -ref_genome_version 37 17:06:24.130 INFO [main] - AMBER version: 3.9 17:06:24.133 INFO [main] - Loading vcf file Resources/GermlineHetPon.37.vcf.gz 17:06:28.286 INFO [main] - loaded 1344545 baf loci 17:06:28.347 INFO [main] - Processing 1344545 potential sites in reference bam test/COLO829R.bam 17:06:30.482 INFO [main] - 1344545 loci, 148821 genome regions, min gap = 4000 17:06:30.518 INFO [main] - 1 bam reader threads started Exception in thread "worker-0" htsjdk.samtools.SAMException: No index is available for this BAM file. at htsjdk.samtools.BAMFileReader.getIndex(BAMFileReader.java:411) at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952) at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400) at com.hartwig.hmftools.amber.AsyncBamLociReader$BamReaderThread.run(AsyncBamLociReader.java:69) 17:06:30.525 INFO [main] - 1 bam reader threads finished 17:06:30.804 INFO [main] - Median normal depth is 0 reads: filtering reads outside of 0 and 0 17:06:31.099 INFO [main] - 0 heterozygous, 0 homozygous in reference bams 17:06:31.296 INFO [main] - Median normal depth is 0 reads: filtering reads outside of 0 and 0 17:06:31.349 INFO [main] - Processing 0 germline heterozygous loci in tumor bam test/COLO829T.bam 17:06:31.349 INFO [main] - Processing 0 germline homozygous loci in tumor bam test/COLO829T.bam for contamination 17:06:31.352 INFO [main] - 0 loci, 0 genome regions, min gap = 2000 17:06:31.354 INFO [main] - 1 bam reader threads started 17:06:31.360 INFO [main] - 1 bam reader threads finished 17:06:31.369 INFO [main] - Median tumor depth at potential contamination sites is 0 reads 17:06:31.376 INFO [main] - No evidence of contamination. 17:06:31.406 INFO [main] - Writing 0 contamination records to out/COLO829T.amber.contamination.vcf.gz 17:06:31.552 INFO [main] - Writing 1114 germline snp records to out/COLO829R.amber.snp.vcf.gz 17:06:31.879 INFO [main] - Applying pcf segmentation 17:06:31.885 INFO [main] - Executing R script via command: Rscript /tmp/script15322318469979531158.R out/COLO829T.amber.baf.tsv.gz out/COLO829T.amber.baf.pcf

p-priestley commented 1 year ago

You need to index your bams. eg.

samtools index COLO829R.bam

ZWael commented 1 year ago

hi @p-priestley yes it is working by adding index files to the test folder Thanks for hints