Closed saulpierotti closed 1 year ago
Thanks, do you have a version number for STITCH for this run?
Hey @saulpierotti-ebi, from my experience , you can just rerun the job a couple of times but not on a very busy server or a IO heavy disk. This should be an potential issue of R
By re-running a couple of times I was able to get 22 of my 24 chromosomes to run but the remaining ones (chr 9 and 14 in medaka fish) seem to always fail. I am using the STITCH installation from conda, version 1.6.6
Here the output of sessionInfo()
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 8.5 (Green Obsidian)
Matrix products: default
BLAS/LAPACK: /hps/software/users/birney/saul/nextflow_software_cache/conda/env-cf5167503e10068f41f4fff4fb86854e/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] STITCH_1.6.6 rrbgen_0.0.6
loaded via a namespace (and not attached):
[1] compiler_4.2.1 tools_4.2.1 Rcpp_1.0.9
Just to confirm that you do get some failed jobs done by rerunning the jobs, is it the case for you?
Hey,
When you re-run, does it always crash at the same point, i.e. after
1_Welz_lane1BeW24D5_AAALLVCHV_BeW9_24_25_26_22s001603-1-1_Welz_lane1BeW24D5
Relatedly, are you able to narrow down to a single bam file that fails repeatedly?
Assuming yes, it sure seems like a genuine bug, which I would be very interested in fixing. If you're willing to send me the relevant bam/cram file and posfile, and any other settings I need to set, I would welcome the opportunity to find and fix the bug.
Thanks Robbie
For what it's worth, I just saw a similar bug with one of my jobs running overnight, in the same function. There's no randomness in that function. When I re-ran on a different machine it was fine. So I really don't know what's going on. I wonder if it's something about different dynamic libraries being used on different hosts. @Zilong-Li not sure if you have a further opinion here
Hi @rwdavies there is a same question on SO https://stackoverflow.com/questions/49190251/caught-segfault-memory-not-mapped-error-in-r. Could be due to different dynamic library. I haven't looked into the dump core file and not sure which lib caused it. Could be the Rcpp side.
the version 1.6.7 should have this fixed.
Hi, I get the following segfault error when running STITCH on a few of my chromosomes. I am running STITCH as part of a Nextflow pipeline so part of the log is Nextflow-related. I thought about a memory issue but the peak memory consumption detected by Nextflow and LSF is much lower than the allocated memory (see bottom of the log).
The log on pastebin since it is too long:
https://pastebin.com/t0PYzG5S