tpook92 / HaploBlocker

R-package: Calculation of haplotype blocks and libraries
GNU General Public License v3.0
25 stars 2 forks source link

How prepare input from ped file with genotyping data #4

Open kroma007 opened 4 years ago

kroma007 commented 4 years ago

Dear Torsten I am very interested HaploBlocker tool to perform haplotype block analysis of my data. Haploview with Gabriel method gave nothing special so I thought that may your algorythm would be more helpful here. However, I think I have problem with correct input format. Please note, that I am not advanced bioinformatic user and I am still learning this ;) I have genotyping data of 190 human samples for 17 SNPs from one gene in standard plink format so there is ped file. Each of these SNPs has two alleles so each samples has 34 alleles and it looks like this (first six columns in ped are informations and I bolded them, columns are separated by tab, and alleles of the same SNP by space): 22 subject-22 0 0 0 1 3 3 1 1 1 1 2 2 2 1 1 1 2 2 2 2 4 4 3 3 3 3 3 3 1 1 1 1 3 3 4 2 1 1

When I tried to perform _blockcalculation function I see:

Error in command 'fixcoding(unique.dhm)':
   maximum number of 'values' is 256. Got 395.

I saw in instruction that standard input is haplotypes dataset, however I have ped file with doubled alleles. On the other hand, in your publication you presented results of analysis of huge datasets so this is somehow possible but I do not know for such the data as mine. Could you explain how can I perform correctly input file to HaploBlocker from my dataset, if it is possible at all, for such the data? Thank you in advance, I will be very glad :) Marcin Słomka

tpook92 commented 4 years ago

Dear Marcin, with the newest version (1.5.8) you can just enter the path of our ped file as input in dhm: blocklist <- block_calculation(dhm = "path_to_your_ped_file.ped")

Note however that HaploBlocker is requiring a phased dataset and the first entry in an marker will always be interpreted as the first haplotype. Furthermore i would assume that your dataset is extremely small for what block structure HaploBlocker was originally intended for. I would assume that you have to adapt the window_size (maybe even start with single SNP as windows) and set a target_coverage.

i would suggest to try: blocklist <- block_calculation(dhm = "path_to_your_ped_file.ped", window_size = 1, target_coverage = 0.9, merging_error = 0)

Best regards Torsten