Closed pelinunal closed 2 years ago
Hi @pelinunal,
Glad that you have solved this issue. The example scripts are designed for jobs that are submitted via SLURM. Please feel free to make any adjustments suitable for your cluster environment, and let me know if you have any other questions. Thanks!
Best, Xihao
Hi Xihao again, Thank you for your reply and I have another question about Individual_analysis.
The 'gruopid' returns as negative values and therefore start_loc/end_loc are negative.
I do not want to change anything that could affect the results. Could you help me please with this?
Thanks! Pelin
> print(arrayid)
[1] 25 50 70 89 108 123 139 154 162 176 190 204 214 223 232 236 245 254 260 267 271 272
> print(chr)
[1] 22
>if (chr == 1){
groupid <- arrayid
}else{
groupid <- arrayid - cumsum(jobs_num$individual_analysis_num)[chr-1]
}
>print(groupid)
[1] -247 -222 -202 -183 -164 -149 -133 -118 -110 -96 -82 -68 -58 -49 -40 -36 -27 -18 -12 -5 -1 0
>start_loc <- (groupid-1)*10e6 + jobs_num$start_loc[chr]
>end_loc <- start_loc + 10e6 - 1
>end_loc <- min(end_loc,jobs_num$end_loc[chr])
>print(start_loc)
[1] -2459409480 -2209409480 -2009409480 -1819409480 -1629409480 -1479409480 -1319409480 -1169409480 -1089409480 -949409480
[11] -809409480 -669409480 -569409480 -479409480 -389409480 -349409480 -259409480 -169409480 -109409480 -39409480
[21] 590520 10590520
>print(end_loc)
[1] -2449409481
Hi @pelinunal,
Thanks for your question. Could you please print jobs_num
in R, so that we can take a closer look and see why the groupid
and start_loc/end_loc
can be negative?
Best, Xihao
Thank you for reaching out quickly. Here it is;
> print(jobs_num)
chr start_loc end_loc individual_analysis_num sliding_window_num scang_num
1 1 72647 248937242 25 50 166
2 2 10366 242160055 25 49 162
3 3 11859 198179447 20 40 133
4 4 36843 189998586 19 38 127
5 5 13701 181361105 19 37 121
6 6 30000022 170741281 15 29 94
7 7 37325 159334706 16 32 107
8 8 61464 145075204 15 30 97
9 9 61665918 138220359 8 16 52
10 10 11736 133773617 14 27 90
11 11 123276 135076471 14 27 90
12 12 14550 133259472 14 27 89
13 13 18175067 114353606 10 20 65
14 14 18297664 106880915 9 18 60
15 15 19866420 101973541 9 17 55
16 16 10573 36192247 4 8 25
17 17 114930 83238872 9 17 56
18 18 44788 80262395 9 17 54
19 19 71266 58606381 6 12 40
20 20 60435 64331792 7 13 43
21 21 13165333 46696127 4 7 23
22 22 10590520 19999883 1 2 7
Hi @pelinunal,
Thanks for sharing. In this case, you are expected to submit 272 parallel (independent) jobs as sum(jobs_num$individual_analysis_num) = 272
. The problem is that in each of your submitted job, arrayid
should be a unique integer (between 1 and 272, for example, 25), rather than a numeric vector as you posted above.
Hope this helps.
Best, Xihao
Hi Xihao,
I have a general problem with arrayid
and arrayid_longmask
.
We are submitting jobs via LSF cluster environment. arrayid <- as.numeric(R.utils::commandArgs(TRUE)[1])
turns as integer (empty)
.
For individual analysis, adding arrayid <- as.integer(sum(jobs_num$individual_analysis_num))
worked smoothly, as you suggested.
However, for the other steps, we have tried to define arrayid
or arrayid_longmask
via arrayid <- as.integer(sum(group.num.allchr))
. The results turn as NULL
.
Do you have any suggestion? How we can solve this problem?
Thanks! Pelin
Hi @pelinunal,
Thanks for your question. I may not be that familiar with the LSF cluster environment you are using. However, for the SLURM cluster environment, in the associated shell script for the job submission, we specify the following line of command:
#SBATCH --array=1-379 --mem=6000
In this case, the SLURM scheduler will submit 379 jobs in parallel. In each submitted job, there is a unique job id which is an integer between 1 and 379, such that in the R script, the following command
arrayid <- as.numeric(commandArgs(TRUE)[1])
will assign that unique integer to the variable arrayid
. For your case, you can either try to figure out what is the analog of the command #SBATCH --array=1-379 --mem=6000
in the LSF cluster environment; or you can manually submit 379 jobs, and explicitly set arrayid <- 1
, ..., arrayid <- 379
in each job.
Hope this helps.
Best, Xihao
Dear xihaoli,
To perform rare-variant analysis from genotype data, I have been sent batch jobs to our uni cluster using your R scripts, and I came until Individual_analysis, however, I have an issue about:
It returns as "NAreal" and the following codes don't work. Do you have any suggestions to solve this issue?
Thanks!
Here is R info;