Closed matt-shenton closed 3 years ago
Dear Matt, thanks for spotting another issue! The current version of overlap removal is only working when using a single window_size. However, the adaptive mode was automatically using multiple window sizes and thus screen multiple window clusters. For now, I just deactivated the use of multiple window sizes in the adaptive mode when using overlap removal. This is less flexible but should at least work. Default is to use a window size of 20 - please make sure that this is not too large for your dataset.
I will think about this more for a long-term fix, but with the current design of overlap removal, this is just not possible.
Best regards Torsten
Dear Torsten,
Thanks for looking at this.
I tried the new version, but still getting the same error:
blocklist<-block_calculation(paste0(invcf,".vcf.gz"),adaptive_mode=FALSE,window_size=5,big_output=TRUE,overlap_remove=TRUE,window_cores=12,parallel_window=pwn,window_overlap=won)
error message: blocklist_list[[index]][[index2]][[12]] でエラー: 添え字が許される範囲外です 呼び出し: block_calculation
Thanks again Matt
I just uploaded a new version. This time it should really work.
While working on the issue I noticed that when using parallel windows there is still some potential overlap between haplotype blocks when using a window_overlap > 0 in those overlapping areas. In your example that was 0.4% areas with overlapping blocks. If that is not wanted you would either have to set window_overlap = 0 or not run your analysis in parallel.
Many thanks again. And thanks for the heads up about the potential overlaps.
Best regards Matt
Hi there,
When I use overlap_remove=TRUE with parallel windows, I get a subscript out of range error.
The process of overlap removal seems to complete:
Start_Overlap_removal: Before: 174 Blocks, 91.65 % Coverage, 144.63 % Overlapping segments. After: 153 Blocks, 74.18 % Coverage, 0 % Overlapping segments. [1] "Finish Target_Coverage Iteration 10" [1] "Used min_majorblock: 1" [1] "Achieved Coverage: 0.74181469871125" [1] "Final Iteration using min_majorblock 1" [1] "Achieved Coverage: 0.74181469871125"
but then throws an error: blocklist_list[[index]][[index2]][[12]] でエラー:
添え字が許される範囲外です 呼び出し: block_calculation
translated: Error in blocklist_list [[index]] [[index2]] [[12]]: It is out of the range where subscripts are allowed Call: block_calculation
If I use overlap_remove=FALSE, the process completes and gives output as expected.
I used the same test data as previously, window size 10000, overlap 1000
I think you can access the vcf file and the bpmap file at the links below:
https://drive.google.com/file/d/1O0dvwTcpLfiqOZT2naz6AHYkTQGfHSrK/view?usp=sharing https://drive.google.com/file/d/1j0CAK4NPru_yWjD7HzvW5ZhKuN3m6EqL/view?usp=sharing
Here is the script I used to run the program
`args = commandArgs(trailingOnly=TRUE) library(vcfR) library(HaploBlocker)
invcf<-args[1] pw<-args[2] wo<-args[3] mybpmap<-read.table(paste0(invcf,".bpmap"),header=FALSE) mybpvector<-mybpmap[,1] pwn<-as.integer(pw) won<-as.integer(wo) blocklist<-block_calculation(paste0(invcf,".vcf"),bp_map=mybpvector,adaptive_mode=TRUE,big_output=TRUE,overlap_remove=TRUE,window_cores=12,parallel_window=pwn,window_overlap=won)
saver<-paste0(invcf,"",pw,"",wo,"overlap.rds") saveRDS(object = blocklist, file = saver)
png(paste0(invcf,"",pw,"",wo,"overlap.png"),width=4800,height=2400)
plot_block(blocklist[[1]],add_sort=FALSE) dev.off()`
I call it using qsub on the univa grid engine system using the following:
`#!/bin/bash
$ -S /bin/bash
$ -cwd
$ -jc M.c12
conda activate haplotype_blocker
invcf=$1 pw=$2 wo=$3
/lustre/home/mshenton/anaconda3/envs/haplotype_blocker/bin/Rscript haploblock_no_remove_pw_wo_test.R ${invcf} ${pw} ${wo} `
Thanks Best wishes Matt