Closed SSmith-NUIG closed 2 years ago
Bed file argument doesn't seem to work so posting my solution here in case anyone runs into similar problem
Take your bed file of chromosome regions and convert it to his format:
LG1:0-27754200
LG2:0-16089512
LG3:0-13619445
LG4:0-13404451
LG5:0-13896941
LG6:0-17789102
LG7:0-14198698
LG8:0-12717210
LG9:0-12354651
LG10:0-12360052
LG11:0-16352600
LG12:0-11514234
LG13:0-11279722
LG14:0-10670842
LG15:0-9534514
LG16:0-7238532
MT:0-16343
The next step can be skipped if you want to loop over the previous file but I found it easier this way because of how SLURM arrays work: Split up the previous file into region files for each chromosome using a script like this:
i=0
while read p; do
i=$[i + 1]
echo "$p" > "$i"_region.txt
done < crisp_regions.txt
Then a SLURM array job can be submitted easily using this:
#!/bin/sh
#SBATCH --job-name="CRISP"
#SBATCH --output=/logs/CRISP%A_%a.log
#SBATCH --array=1-17
while IFS= read -r line; do
CRISP.binary \
--bams full_bam_list.txt \
--ref genome.fa \
--poolsize 60 \
--regions $line \
--VCF crisp_"$SLURM_ARRAY_TASK_ID".vcf \
--qvoffset 33 \
--mbq 20 \
--mmq 20 \
--minc 5
done < "$SLURM_ARRAY_TASK_ID"_region.txt
It's now doing each chromosome separately which should speed things up and allow easy concatenation at the end.
Thank you for posting this issue and providing the solution as well. The --regions option is better suited for analyzing chromosomes separately. The --bed option and the --regions can be used together. The --bed option should only output variants in the intervals specified in the file, I will look in the code to see if there is an error.
The code has been updated to fix this problem. Thank you for raising this issue.
Hi, I'm attempting to run CRISP on all chromosomes separately before concatenating into a final VCF file but it doesn't seem to be skipping over the regions not contained in the bed file.
For example, 2.bed contains this:
yet the crisp_2.vcf contains calls from outside of this region:
Here is the top of the logfile:
`
Here is the SLURM script