tpbilton / GUSMap

Genotyping Uncertainty with Sequencing data and linkage MAPping
GNU General Public License v3.0
7 stars 6 forks source link

Too many SNPs are getting filtered #20

Open raphaelbetschart opened 4 years ago

raphaelbetschart commented 4 years ago

Hey Timothy,

I am currently trying to create a linkage map with GUSMap. Unfortunately, many SNPs get discarded, even when I choose the parameters such that the filtering is very low. In the beginning, I have around 300'000, while after filtering only 3'000 remain. If I then continue like in your tutorial, I get almost all SNPs in one linkage group, while in the other group there are only between 2-3 SNPs. Do you have an idea what I am doing wrong?

tpbilton commented 4 years ago

Hi,

There could be a number reasons why most the SNPs are being filtered out. I assuming you have played around with different filtering criteria and which one tends to have a greater effect on the SNPs may give insight to what is going on in your dataset.

Also, how deep were your parents sequenced? It could be that GUSMap is struggling to infer the parental genotypes, especially if they have low depths (it is generally better if the parents have higher depth than the offspring). One option is to set inferSNPs=FALSE argument in the makeFS function (this is a later addition to the package that I haven't included in the tutorial yet). This will infer the segregation type of the SNPs from the progeny but the problem is that MI SNPs can not be distinguished from PI SNPs. Hence, the SNPs that have their segregation type inferred from the parental genotypes are needed to form the linkage groups. The function $addSNPs can then be used to add the SNPs with inferred segregation to the already formed linkage groups.

In terms of of getting only 1 large LG: have you tried using a higher LOD threshold? My guess is that the LOD threshold is too low and so SNPs from different chromosomes are getting merged into the same linkage group. A heatmap generated from the $plotLG function could highlight what is going here.

It is a bit hard trying to work out exactly what is going on without seeing the data. Some output and plots would really help here. If are ok with sharing some this, then maybe email me (tbilton@maths.otago.ac.nz) directly and we can discuss this more privately.

Hope this helps.

Timothy