popgenDK / SATC

Sex assignment through coverage
17 stars 10 forks source link

filters in sexDetermine() #7

Open RAWWiberg opened 2 years ago

RAWWiberg commented 2 years ago

Hello,

I am wondering what the rationale behind some of the filters in the sexDetermine() function is. Specifically in the line

X_Z_Scaffold <- beta > 0.4 & beta < 0.6  & homoMedian<1.3 & homoMedian>0.7 & heteroMedian<0.7 & heteroMedian>0.3 & sexAssoScafs

From the paper I think I understand the beta > 0.4 & beta < 0.6 part. But I am not sure why you set minimum and maximum normalised coverage thresholds for the homomorphic and heteromorphic individuals (i.e. the homoMedian<1.3 & homoMedian>0.7 & heteroMedian<0.7 & heteroMedian>0.3 part).

In my data I have several contigs for which we have prior reason to suspect they might be X- (or Y-) linked and I am checking whether the SATC pipeline agrees. All of my putatively X-linked contigs end up with significant differences in normalised coverage, but because I have values of ~1 in males (XY) and ~2 in females (XX), they do not pass this X_Z_Scaffold filter.

I would love to hear your thoughts on this? Do you have a strong reason to apply these additional filters that I am missing?

Best wishes Axel

casia16 commented 2 years ago

Hi the reason behind those thresholds is after normalization, expected depth of sex-linked scaffolds would be ~0.5 in homomorphic and ~1.0 in heteromorphic individuals. It seems that your normalization doesnt work as we expect, perhaps you can visually check if the normalizing scaffolds (by default are top 5 longest) behaving nicely - check on normalized depth plot.