tpook92 / HaploBlocker

R-package: Calculation of haplotype blocks and libraries
GNU General Public License v3.0
28 stars 2 forks source link

typo in plot_block() + Usage of HaploBlocker on small populations #2

Open JennyHTLee opened 5 years ago

JennyHTLee commented 5 years ago

Hi,

I would like to report on a typo in the plot_block function for the option "orientation"

plot_block(parent.t.sliced_blocklist, orientation="front") Error in plot_block(parent.t.sliced_blocklist, orientation = "front") : unused argument (orientation = "front") plot_block(parent.t.sliced_blocklist, oriantation="front")

I just started to test out HaploBlocker, so I've been playing around with the parameters. Thanks for your work.

Cheers, Jenny

tpook92 commented 5 years ago

You are indeed right. The parameter was falsely named oriantation and not orientation. I uploaded a new version that is fixing this issue (version 1.4.7).

JennyHTLee commented 5 years ago

Thanks,

I have a more general question: I am trying to identify haplotype blocks in a very small population (10 genotypes) with the aim of tracing recombination from parent to F2s. The experimental design is therefore quite different from usual where you'd prefer a largely diverse population to confidently call blocks. My question is which parameters you would recommend to modify? I guess I would need to reduce the confidence and coverage cutoff? I've tried window_size and node_min so far.

Happy to hear your suggestions.

Cheers, Jenny

tpook92 commented 5 years ago

Boundaries of haplotype blocks in HaploBlocker are not really reflective of recent recombinations. We are more or less tracking ancient recombinations and segments in the genome shared by groups of individuals - in case founder genotypes/haplotypes are available I would recommend to go for traditional pair-wise IBD based approaches.

As designed HaploBlocker is tracking group-wise IBD and therefore block boundaries will not maximize block length for pairs of haplotypes.

To set more of a focus on pairs, you should reduce node_min and edge_min to 2. Additional you should reduce min_majorblock heavily or set a target coverage to the share of the dataset you want to cover by blocks. To maximize block length you could additional set double_share to something below 1 - if computing time is of no issue I would suggest to go for 0.5-0.6 so blocks of 3 haplotypes can be further split up.

In case both founder and F2 genotypes are available you could also consider splitting the dataset into two subgroups and requiring each block to contain at least 1 haplotype from both the founders and the F2.

JennyHTLee commented 5 years ago

Thank you very much for your useful comments!

JennyHTLee commented 5 years ago

I run into the error

Error in blocklist[[index + first_block - 1]] : subscript out of bounds

sometimes when I adjust the parameters. Is it because the number of nodes are too low? How can I resolve this?

Example:

block_calculation(s1t.t.s.pos[,-1], prefilter="TRUE", maf=0.05, node_min=2, edge_min=2, target_coverage=0.4, double_share=0.5) Start_blockinfo_calculation ..........................Start_nodes_calculation ..........................Start_simple_merge: 1303 Start_CrossMerging_full Iteration 1 : 856 nodes Iteration 2 : 441 nodes Iteration 3 : 318 nodes Iteration 4 : 263 nodes Iteration 5 : 251 nodes Start_IgnoreSmall Iteration 1 : 251 nodes Iteration 2 : 27 nodes Iteration 3 : 15 nodes Start_Blockmerging Iteration 1 : 15 blocks Iteration 2 : 13 blocks Error in blocklist[[index + first_block - 1]] : subscript out of bounds

Thank you again

tpook92 commented 5 years ago

This error is caused by no blocks being in the haplotype library at some point of the algorithm. As your dataset is relatively small this can happen as min_majorblock always starts at 5000 - I will add something to account for this in the next version of the package. Short-term you can solve it by setting min_majorblock to a small value (e.g. 50)