rwdavies / QUILT

GNU General Public License v3.0
45 stars 10 forks source link

scheduled core X encountered error in user code, all values of the job will be affected #12

Open ZheZhang-ZZ opened 2 years ago

ZheZhang-ZZ commented 2 years ago

Hi Robert,

I tried to use QUILT to conduct imputation for bams with 0.8X to 1X coverage in different chunks of a chromosome. However, in some chunks, it keeps reporting "scheduled core X encountered error in user code, all values of the job will be affected". I have tried to reduce the number of cores used, but it did not work. Do you have any suggestion?

Best, Zhe

rwdavies commented 2 years ago

Hi Zhe,

Sorry for my slow reply, the start of term is very busy with teaching.

A few things to check 1) RAM. If you use more than one core, and you run out of RAM for the total job, individual R sub-processes will fail, and you will get an error like what you see. To see if this is the case, you can watch a running job, using e.g. top or htop. Alternatively if you're using a scientific workflow engine (like SGE), you can normally get information about completed jobs, including maximum RAM and if they've hit memory limits. 2) Some error with your input data, principally, the BAM files. I'm guessing this is unlikely. But if you've done some unusual processing, this could crop up. If you're able to reproducibly isolate an error for a single sample, then maybe try validating the BAM file using e.g. ValidateSamFile from Picard. 3) Some bug in my code. Entirely possible, there are likely still bugs in my code. Again ideally you would isolate and reproduce this bug, and again ideally send me some minimal data so I can fix it (or work with me on printing more error messages so I can understand it, and then fix it). To try looking for this, try setting a seed (QUILT has a --seed option, or you can set.seed() in R) then re-running without multiple cores (nCores=1). Keep trying this for different seeds until a BAM file reproducibly fails. Then try finding a seed so that only running that single BAM file fails. Ideally then you would make that data available to me, or again as I mentionned, we could discuss how you would narrow down the bug, so I could fix it.

Hope that helps. Thanks for your message. Best, Robbie

ZheZhang-ZZ commented 2 years ago

Many thanks, Rbbie. I will have a try to reproduce the fails.