tpook92 / HaploBlocker

R-package: Calculation of haplotype blocks and libraries
GNU General Public License v3.0
26 stars 2 forks source link

overlap_remove using parallel windows - subscript out of range error #12

Closed matt-shenton closed 3 years ago

matt-shenton commented 3 years ago

Hi there,

When I use overlap_remove=TRUE with parallel windows, I get a subscript out of range error.

The process of overlap removal seems to complete:

Start_Overlap_removal: Before: 174 Blocks, 91.65 % Coverage, 144.63 % Overlapping segments. After: 153 Blocks, 74.18 % Coverage, 0 % Overlapping segments. [1] "Finish Target_Coverage Iteration 10" [1] "Used min_majorblock: 1" [1] "Achieved Coverage: 0.74181469871125" [1] "Final Iteration using min_majorblock 1" [1] "Achieved Coverage: 0.74181469871125"

but then throws an error: blocklist_list[[index]][[index2]][[12]] でエラー:
添え字が許される範囲外です 呼び出し: block_calculation

translated: Error in blocklist_list [[index]] [[index2]] [[12]]: It is out of the range where subscripts are allowed Call: block_calculation

If I use overlap_remove=FALSE, the process completes and gives output as expected.

I used the same test data as previously, window size 10000, overlap 1000

I think you can access the vcf file and the bpmap file at the links below:

https://drive.google.com/file/d/1O0dvwTcpLfiqOZT2naz6AHYkTQGfHSrK/view?usp=sharing https://drive.google.com/file/d/1j0CAK4NPru_yWjD7HzvW5ZhKuN3m6EqL/view?usp=sharing

Here is the script I used to run the program

`args = commandArgs(trailingOnly=TRUE) library(vcfR) library(HaploBlocker)

invcf<-args[1] pw<-args[2] wo<-args[3] mybpmap<-read.table(paste0(invcf,".bpmap"),header=FALSE) mybpvector<-mybpmap[,1] pwn<-as.integer(pw) won<-as.integer(wo) blocklist<-block_calculation(paste0(invcf,".vcf"),bp_map=mybpvector,adaptive_mode=TRUE,big_output=TRUE,overlap_remove=TRUE,window_cores=12,parallel_window=pwn,window_overlap=won)

saver<-paste0(invcf,"",pw,"",wo,"overlap.rds") saveRDS(object = blocklist, file = saver)

png(paste0(invcf,"",pw,"",wo,"overlap.png"),width=4800,height=2400)

plot_block(blocklist[[1]],add_sort=FALSE) dev.off()`

I call it using qsub on the univa grid engine system using the following:

`#!/bin/bash

$ -S /bin/bash

$ -cwd

$ -jc M.c12

conda activate haplotype_blocker

invcf=$1 pw=$2 wo=$3

/lustre/home/mshenton/anaconda3/envs/haplotype_blocker/bin/Rscript haploblock_no_remove_pw_wo_test.R ${invcf} ${pw} ${wo} `

Thanks Best wishes Matt

tpook92 commented 3 years ago

Dear Matt, thanks for spotting another issue! The current version of overlap removal is only working when using a single window_size. However, the adaptive mode was automatically using multiple window sizes and thus screen multiple window clusters. For now, I just deactivated the use of multiple window sizes in the adaptive mode when using overlap removal. This is less flexible but should at least work. Default is to use a window size of 20 - please make sure that this is not too large for your dataset.

I will think about this more for a long-term fix, but with the current design of overlap removal, this is just not possible.

Best regards Torsten

matt-shenton commented 3 years ago

Dear Torsten,

Thanks for looking at this.

I tried the new version, but still getting the same error:

blocklist<-block_calculation(paste0(invcf,".vcf.gz"),adaptive_mode=FALSE,window_size=5,big_output=TRUE,overlap_remove=TRUE,window_cores=12,parallel_window=pwn,window_overlap=won)

error message: blocklist_list[[index]][[index2]][[12]] でエラー: 添え字が許される範囲外です 呼び出し: block_calculation

Thanks again Matt

tpook92 commented 3 years ago

I just uploaded a new version. This time it should really work.

While working on the issue I noticed that when using parallel windows there is still some potential overlap between haplotype blocks when using a window_overlap > 0 in those overlapping areas. In your example that was 0.4% areas with overlapping blocks. If that is not wanted you would either have to set window_overlap = 0 or not run your analysis in parallel.

matt-shenton commented 3 years ago

Many thanks again. And thanks for the heads up about the potential overlaps.

Best regards Matt