rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
76 stars 17 forks source link

Segfault #72

Closed saulpierotti closed 1 year ago

saulpierotti commented 1 year ago

Hi, I get the following segfault error when running STITCH on a few of my chromosomes. I am running STITCH as part of a Nextflow pipeline so part of the log is Nextflow-related. I thought about a memory issue but the peak memory consumption detected by Nextflow and LSF is much lower than the allocated memory (see bottom of the log).

The log on pastebin since it is too long:

https://pastebin.com/t0PYzG5S

rwdavies commented 1 year ago

Thanks, do you have a version number for STITCH for this run?

Zilong-Li commented 1 year ago

Hey @saulpierotti-ebi, from my experience , you can just rerun the job a couple of times but not on a very busy server or a IO heavy disk. This should be an potential issue of R

saulpierotti commented 1 year ago

By re-running a couple of times I was able to get 22 of my 24 chromosomes to run but the remaining ones (chr 9 and 14 in medaka fish) seem to always fail. I am using the STITCH installation from conda, version 1.6.6

Here the output of sessionInfo()

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 8.5 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /hps/software/users/birney/saul/nextflow_software_cache/conda/env-cf5167503e10068f41f4fff4fb86854e/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
    [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
     [9] LC_ADDRESS=C               LC_TELEPHONE=C
     [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

     attached base packages:
     [1] parallel  stats     graphics  grDevices utils     datasets  methods
     [8] base

     other attached packages:
     [1] STITCH_1.6.6 rrbgen_0.0.6

     loaded via a namespace (and not attached):
     [1] compiler_4.2.1 tools_4.2.1    Rcpp_1.0.9
Zilong-Li commented 1 year ago

Just to confirm that you do get some failed jobs done by rerunning the jobs, is it the case for you?

rwdavies commented 1 year ago

Hey,

When you re-run, does it always crash at the same point, i.e. after 1_Welz_lane1BeW24D5_AAALLVCHV_BeW9_24_25_26_22s001603-1-1_Welz_lane1BeW24D5 Relatedly, are you able to narrow down to a single bam file that fails repeatedly?

Assuming yes, it sure seems like a genuine bug, which I would be very interested in fixing. If you're willing to send me the relevant bam/cram file and posfile, and any other settings I need to set, I would welcome the opportunity to find and fix the bug.

Thanks Robbie

rwdavies commented 1 year ago

For what it's worth, I just saw a similar bug with one of my jobs running overnight, in the same function. There's no randomness in that function. When I re-ran on a different machine it was fine. So I really don't know what's going on. I wonder if it's something about different dynamic libraries being used on different hosts. @Zilong-Li not sure if you have a further opinion here

Zilong-Li commented 1 year ago

Hi @rwdavies there is a same question on SO https://stackoverflow.com/questions/49190251/caught-segfault-memory-not-mapped-error-in-r. Could be due to different dynamic library. I haven't looked into the dump core file and not sure which lib caused it. Could be the Rcpp side.

Zilong-Li commented 1 year ago

the version 1.6.7 should have this fixed.