tpook92 / HaploBlocker

R-package: Calculation of haplotype blocks and libraries
GNU General Public License v3.0
26 stars 2 forks source link

overlapping blocks #3

Open xingwu2 opened 5 years ago

xingwu2 commented 5 years ago

Hi,

I wonder why some blocks are overlapped with other blocks. As shown in the tutorial, Block 1 is from SNP 1 to 89, and Block 2 is from SNP 1 to 955. Also, I ran the program with my own data, I found block 1 was from SNP 21 to 100, and block2 was from SNP 20 to 224. I am very confused why blocks are not independent.

Best

Xing

tpook92 commented 5 years ago

Hi Xing, it is true that blocks in our approach can be overlapping. However, blocks spanning over the same markers does not directly mean that they are. In addition to the physical position each block is also representing a specific allele sequence. This then would mean that there are multiple distinct allelic sequences in that region (which should basically always be the case).

To derive more traditional window based haplotype blocks you can use the function block_windowdataset ( I just uploaded some additional functionality for that function)

Overlap can occur when there is a big group of individuals that is sharing a short sequence of alleles and a subset of those individuals with a longer shared sequence in that region. I am currently working on an additional function to remove any overlap from the dataset. Should be available quite soon.

Best regards

tpook92 commented 4 years ago

The newest version of HaploBlocker is now allowing to identify non-overlapping blocks. To activate this mode set overlap_remove to TRUE in block_calculation().