hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
187 stars 58 forks source link

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space #352

Closed FrancescaMiccolis closed 1 year ago

FrancescaMiccolis commented 1 year ago

Hi, I'm trying to use Linx in a singularity environment using the 1.21 version. I'm running the following command ``linx -sample $sample -ref_genome_version 38 -sv_vcf $vcf_file -purple_dir $path_of_dir -output_dir $outputpath -ensembl_data_dir $path_to_ensembldata -check_fusions -known_fusion_file $path_to_csv. When I try to run linx I got the following error:

23:48:25 - [INFO ] - running SV analysis for $sampleid 23:48:25 - [INFO ] - loaded known fusion data: KNOWN_PAIR(410), IG_KNOWN_PAIR(20), EXON_DEL_DUP(14), PROMISCUOUS_3(33), IG_PROMISCUOUS(2), PROMISCUOUS_5(31) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOfRange(Arrays.java:3822) at java.base/java.lang.StringLatin1.newString(StringLatin1.java:769) at java.base/java.lang.String.substring(String.java:2712) at htsjdk.tribble.util.ParsingUtils.split(ParsingUtils.java:259) at htsjdk.tribble.util.ParsingUtils.split(ParsingUtils.java:222) at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:757) at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:121) at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158) at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148) at htsjdk.variant.variantcontext.GenotypesContext.get(GenotypesContext.java:417) at htsjdk.variant.variantcontext.VariantContext.getGenotype(VariantContext.java:1102) at com.hartwig.hmftools.common.sv.StructuralVariantFactory.setLegCommon(StructuralVariantFactory.java:506) at com.hartwig.hmftools.common.sv.StructuralVariantFactory.create(StructuralVariantFactory.java:292) at com.hartwig.hmftools.common.sv.StructuralVariantFactory.addVariantContext(StructuralVariantFactory.java:132) at com.hartwig.hmftools.common.sv.StructuralVariantFileLoader$$Lambda$114/0x0000000800d70ac0.accept(Unknown Source) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at com.hartwig.hmftools.common.sv.StructuralVariantFileLoader.fromFile(StructuralVariantFileLoader.java:25) at com.hartwig.hmftools.linx.SvFileLoader.loadSvDataFromVcf(SvFileLoader.java:47) at com.hartwig.hmftools.linx.SvFileLoader.loadSampleSvDataFromFile(SvFileLoader.java:38) at com.hartwig.hmftools.linx.SampleAnalyser.processSample(SampleAnalyser.java:203) at com.hartwig.hmftools.linx.SampleAnalyser.processSamples(SampleAnalyser.java:170) at com.hartwig.hmftools.linx.LinxApplication.<init>(LinxApplication.java:207) at com.hartwig.hmftools.linx.LinxApplication.main(LinxApplication.java:276)

There's a way to set memory options in order to avoid this problem? Thanks in advance.

charlesshale commented 1 year ago

You can control or increase the allocated memory using an option such as this:

java -Xmx48G -jar ~/tools/linx.jar

Could you tell me how many PASS or PON variants are in the VCF? Was it produced by our pipeline, ie with Gridss, Gripss and Purple run beforehand? the largest number of variants I have run through Linx is about 3000 and it completes without issue at that sort of level.

Can you add the '-log_debug' argument to the command, and then if you can send me the resulting log file up until the OOM error I will try to understand more what is going on. Email me directly at c.shale@hartwigmedicalfoundation.nl if that's preferable.

thanks.

FrancescaMiccolis commented 1 year ago

Thanks for your answer. I'm running LINX on GRIDSS output. In my case I'm working with a singularity environment pulled from https://depot.galaxyproject.org/singularity/hmftools-linx:1.21--hdfd78af_0. In order to set the memory option I run the container and search the path of the jar file for linx. Then I changed the command one from linx -sample $sample -ref_genome_version 38 -sv_vcf $vcf_file -purple_dir $path_of_dir -output_dir $outputpath -ensembl_data_dir $path_to_ensembldata -check_fusions -known_fusion_file $path_to_csv to java -Xmx48G -jar /urs/local/share/hmftools-linx-1.21-0/linx.jar -sample $sample -ref_genome_version 38 -sv_vcf $vcf_file -purple_dir $path_of_dir -output_dir $outputpath -ensembl_data_dir $path_to_ensembldata -check_fusions -known_fusion_file $path_to_csv. Now it seems to work without problem.

charlesshale commented 1 year ago

Linx typically uses a relatively small amount of memory so to hit this error is unusual. Some of our samples have up to 2K of passing structural variants and LInx runs ok with 8-16GB memory. Does your sample have many more SVs than that?

Can you email me the log from Linx with 'log_debug' enabled?

thanks.