Closed yuyu614 closed 5 months ago
Hi,
Hi,
Thank you very much for your prompt response. I fear I may not have been entirely clear in my initial query, so I'd like to clarify my situation a bit further. My dataset exhibits strong familial relationships; out of 5000 individuals, approximately 2000 have kinship ties. Upon utilizing the pipeline and reaching the "Extract unrelated samples" stage, I found that it yielded only 800 individuals deemed unrelated. I am concerned about how this might affect subsequent analyses. Additionally, I'm curious whether the count of unrelated individuals needs to match the real-world scenario exactly or if it's just a characteristic suitable for this pipeline's application.
Moreover, I have a question regarding the calculation of genetic divergence, which uses the formula double div = ((double)(nhethet-2*nhomopp)) / ((double) (nhet[sampi] + nhet[sampj])), while the cutoff is determined by -2^-(degree+1.5). From what I understand, the degree of kinship typically considers the consistency of genotypes at the same locus, and genetic divergence assesses the proportion of heterozygotes and homozygotes between two samples. I'm interested in understanding how the cutoff is set in this context and how it integrates with the kinship determination criteria.
Thank you once again for your assistance, and I look forward to your further guidance.
Thank you for your guidance and the valuable insights you've shared. I appreciate it greatly.
1.Regarding the removeHigherDegree Function: I noticed that in the removeHigherDegree function, when the degree parameter is set to 2, the function removes relationships marked as "3rd" but not those marked as "4th". Could you provide some insight into the rationale behind this decision? I think that for degree=2, the function would remove all relationships higher than the specified degree, including both "3rd" and "4th".
2.Dataset with a Specific PropIBD Range: My dataset predominantly features PropIBD values within the 0.2-0.3 range (approximately 90% of the data), and it includes about 5000 samples with 2000 of them having kinship relations. Given this context, I'm concerned about the suitability of the current pipeline for my dataset. Could you provide any recommendations or adjustments to better accommodate datasets with such a specific PropIBD distribution? Is there a particular approach or modification to the pipeline that you would suggest for effectively handling datasets with a high concentration of related samples within this PropIBD range?
I appreciate your time and any guidance you can provide.