hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
187 stars 58 forks source link

ISOFOX: Fail to generate the expected GCrRatio counts file for GRCh38 #369

Closed YingYa closed 1 year ago

YingYa commented 1 year ago

Could not generate the expected GCrRatio counts file for GRCh38 (while successful for GRCh37) with the following command: java -jar isofox.jar -functions EXPECTED_GC_COUNTS -output_dir /path_to_output_data/ -ref_genome /ref_genome/38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -ensembl_data_dir /dna_pipeline/v5_31/38/common/ensembl_data -read_length 151 -threads 48

The error infomation: java.util.concurrent.ExecutionException: htsjdk.samtools.SAMException: Unable to find entry for contig: 11 at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) at com.hartwig.hmftools.common.utils.TaskExecutor.checkThreadCompletion(TaskExecutor.java:101) at com.hartwig.hmftools.common.utils.TaskExecutor.executeTasks(TaskExecutor.java:83) at com.hartwig.hmftools.isofox.Isofox.generateGcRatios(Isofox.java:440) at com.hartwig.hmftools.isofox.Isofox.runAnalysis(Isofox.java:124) at com.hartwig.hmftools.isofox.Isofox.main(Isofox.java:512) Caused by: htsjdk.samtools.SAMException: Unable to find entry for contig: 11 at htsjdk.samtools.reference.FastaSequenceIndex.getIndexEntry(FastaSequenceIndex.java:215) at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSubsequenceAt(AbstractIndexedFastaSequenceFile.java:177) at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:49) at com.hartwig.hmftools.common.genome.refgenome.RefGenomeSource.getBaseString(RefGenomeSource.java:41) at com.hartwig.hmftools.isofox.adjusts.GcRatioCounts.calcGcRatioFromReadRegions(GcRatioCounts.java:166) at com.hartwig.hmftools.isofox.adjusts.GcTranscriptCalculator.calculateTranscriptGcRatios(GcTranscriptCalculator.java:243) at com.hartwig.hmftools.isofox.adjusts.GcTranscriptCalculator.lambda$generateExpectedCounts$0(GcTranscriptCalculator.java:147) at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) at com.hartwig.hmftools.isofox.adjusts.GcTranscriptCalculator.generateExpectedCounts(GcTranscriptCalculator.java:147) at com.hartwig.hmftools.isofox.adjusts.GcTranscriptCalculator.call(GcTranscriptCalculator.java:88) at com.hartwig.hmftools.isofox.adjusts.GcTranscriptCalculator.call(GcTranscriptCalculator.java:40) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at java.base/java.lang.Thread.run(Thread.java:832)

charlesshale commented 1 year ago

Could you add this argument: -ref_genome_version V38

These generated files are already available on our GCP resources page: https://console.cloud.google.com/storage/browser/hmf-public/HMFtools-Resources/rna_pipeline/38/isofox/

YingYa commented 1 year ago

Yes, I could try to run with '-ref_genome_version V38'.

I have been downloaded the GCP resources. My read length is 100bp, should I generated new EXPECTED_TRANS_COUNTS and EXPECTED_GC_COUNTS files, or I could just used that from GCP resources?

charlesshale commented 1 year ago

Yes it is important to generate expected transcript counts based off the correct read length.

The GC counts are not read-length dependent.

YingYa commented 1 year ago

Thanks. I could run EXPECTED_GC_COUNTS with '-ref_genome_version V38' successfully.