hringbauer / ancIBD

Detecting IBD within low coverage ancient DNA data. Development Repository for software package that contains code for manuscript.
GNU General Public License v3.0
9 stars 3 forks source link

add option to use maskfile #12

Open zmaroti opened 11 months ago

zmaroti commented 11 months ago

Hi,

It would be nice if you could add the maskfile option to either to the

hapBLOCK_chroms (to not emit IBD from mask areas, since all relevant genom coordinate info is available here) or filter_ibd_df plus the caller create_ind_ibd_df ind_all_ibd_df (to filter IBD instead of (or additionally with) the SNP density parameter)

functions as a parameter since this could be handled naturally in the base package.

(The individual IBD data in the output of hapBLOCK_chroms (yet) does not contain the genomic coordinates, and the mapping data is not the same scale (M vs cM) as in the mask data, thus simple "shell magic" would be complex to do this.)

While at a few samples, and at the individual pairwise IBD share it is not an issue, when you work with several hundreds individuals the combinations (N*(N-1)/2) gets large and at these genome locations almost everyone will share IBD with all other samples. This result in nedlessly large portion of these false positive IBD compared to the randomly distributed true IBD in the outputs.

Thanks!

hringbauer commented 10 months ago

That is an excellent suggestion showing some deep competence. Thank you!

We will work on implementing it, as it could substantially speed up the post-processing for large datasets. I leave the thread open until then.