simonhmartin / genomics_general

General tools for genomic analyses.
330 stars 92 forks source link

Fst analysis on haploid sex chromosomes #39

Open foala opened 3 years ago

foala commented 3 years ago

Hi Simon, referring to your paper here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3814882/), I have done a similar comparison between autosomal data and haploid sex chromosome data (homogametic individuals) using Fst (popgenWindows.py). However, I found some comments from people saying that Fst doesn't work on haploid data, since it depends on calculating expected heterozygosity. May I know if there is any substance to that claim? I am using homogametic birds (ZW).

Thank you very much

simonhmartin commented 3 years ago

There are various ways to compute Fst. My scripts use nucleotide diversity rather than heterozygosity, following equation 9 here: https://doi.org/10.1093/oxfordjournals.molbev.a040703 In principle, I don't think using the method based on expected heterozygosity is flawed for haploid data, since it is simply a measure of diversity based on allele frequencies and not a measure of the actual number of heterozygous individuals. However, if you want to make a direct comparison between autosomes and haploid sex chromosomes, it is important to ensure that haploid parts of the genome are interpreted as haploid by the software, otherwise the amount of diversity would be underestimated (as they would be interpreted as homozygous diploids). Finally, note that 'homogametic' refers to ZZ (males in birds) or XX (female mammals), ZW and XY are heterogametic. If using only the homogametic sex, you don't need to worry about ploidy issues.

foala commented 3 years ago

Hi Simon, Thank you very for much for your clarification. Indeed, I meant Heterogametic. Still getting confused with the ZW System!

Regarding the part to make the software interpret them as haploid, that is true: My Sex chromosome VCF files (they are separate from the autosomes) show my data as diploid homozygous. And I want to make direct comparison between autosomes and each of the sex chromosomes(In total, 3 Vcf files)

May I know if there is a way to do that with your scripts? Or do you suggest any other software? When I tried VCFTools, it calculates the fst, but says in the log that it will only consider diploid sites (which my haploid sites appears to be in the VCF file anyway) Thank you very much.

simonhmartin commented 3 years ago

In my script parseVCF.py you can specify the ploidy for each sample in a file with the option --ploidyFile. The file has two columns: sample ID and ploidy (2 for diploid and 1 for haploid).

simonhmartin commented 3 years ago

In my script parseVCF.py you can specify the ploidy for each sample in a file with the option --ploidyFile. The file has two columns: sample ID and ploidy (2 for diploid and 1 for haploid).

Note you would have to run this separately for the sex chromosome because all individuals are diploid for autosomes.