luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
303 stars 38 forks source link

Octopus fails to run for some samples in an array job on HPC cluster #169

Closed QingliGuo closed 3 years ago

QingliGuo commented 3 years ago

Describe the bug

To summarise, octopus does not work for every sample in an array job on HPC cluster; and the failed samples are random when submitting the same array job.

More details: I have submitted an array job to run mutation calling using octopus on 22 chromosomes. The VCFs came out for some chromosomes (5-15 ish) but not for the rest.

Please help!!

Version

$ octopus --version
octopus version 0.6.3-beta
Target: x86_64 Linux 5.4.0-1041-azure
Compiler: GNU 9.3.0
Boost: 1_74

Command Command line to install octopus:

conda install -c conda-forge -c bioconda octopus

Command line to run octopus:

$octopus -R $ref --threads 5 -T $chr -I $tumour_bam --output $variant1_chr/"$sample"_"$chr".octopus.vcf

Additional context /opt/sge_spool/8.5.4/nxn15/job_scripts/1516741: line 78: 1264 Illegal instruction $octopus -R $ref --threads 5 -T $chr -I $tumour_bam --output $variant1chr/"$sample""$chr".octopus.vcf

'line 78' is where the calling command line in my shell script.

Extra comment: I have used Octopus_0.5.2 earlier and it has similar issue, failed cases in array jobs. So I decided to use the upgraded version, which also have this issue.

#### UPDATED INFORMATION ####

I tested Octopus_0.5.2 on array job yesterday, 4 out of 1,863 jobs failed with the following error:

[2021-04-16 03:01:29] ------------------------------------------------------------------------ [2021-04-16 03:01:29] octopus v0.5.2-beta [2021-04-16 03:01:29] Copyright (c) 2015-2018 University of Oxford [2021-04-16 03:01:29] ------------------------------------------------------------------------ [2021-04-16 03:01:51] Failed to create temporary directory "/path-to-my-dir/Analysis/octopus-temp-13" - check permissions [2021-04-16 03:01:51] An unclassified error has occurred: [2021-04-16 03:01:51] [2021-04-16 03:01:51] Std::bad_alloc. [2021-04-16 03:01:51] [2021-04-16 03:01:51] To help resolve this error submit an error report. [2021-04-16 03:01:51] ------------------------------------------------------------------------

I re-run the job on those 4 failed samples. It was successful in the second separate submission. The failing rate is much lower in version 0.5.2 compared to version 0.6.3, but still exists.

The "octopus-temp-\d+" folder is suspicious to me for causing this error.

dancooke commented 3 years ago

Sorry for the delay in responding to this. Please can you update to v0.7.4 on Bioconda and try your command again?