Open Orz-CQ opened 1 year ago
Thanks! This sounds like a really cool system! I think what you are describing using the phased chromosomes/haplotypes as "individuals" from the same population or species makes sense. So you map file would look something like this:
map.txt
hap1 diploid
hap2 diploid
hap1 triploid
hap2 triploid
hap3 triploid
hap1 tetraploid
hap2 tetraploid
hap3 tetraploid
hap4 tetraploid
The only other thing is do you have a fourth species to use as an outgroup?
Thanks for you suggestion!
I got a very confused result from the map.txt. The tetraploid was the hybridization results from the diploid and the triploid. For the background, we have already know that the massive introgression or ILS could be happened in this species. The total sites used for this calculation is 492,871,141 which I think is enough to get a powerful and robust results. Also the gamma is 0.6467936334503863.
I am wondering what's your opinion?
I'm wondering if something with the ordering of the taxa in the map file and data file is causing an issue. Are the samples in the same order in both? And are all of the individuals in each taxon listed together?
Sorry for my late reply. I used the map.txt the same as you provided to me. So I think both the answer of the two questions could be yes.
The map file I put above was just a rough example so if the data don't match it exactly it could definitely cause some issues. That map file also doesn't include an outgroup. I think double checking that everything is correctly aligned and that the names of the taxa in the data file and map file are the same would be good. If the data seem correct and you are getting the same result then let me know and we can take a closer look
Sorry for my confusing reply, the map file could be seen like this
A021 A02
A022 A02
A131 A13
A132 A13
A133 A13
A134 A13
A111 A11
A112 A11
A113 A11
ABC out
The names A021, A022, A131, A132... are the sequences' name in the phylip file. which could be seen below,
10 sequence length
A021 CTAAACCCTAAACC
A022 -------------------
A111 CTAAACCCT
I performed the genome-scale alignment by cactus. And one thing need to know is that there were many absent site or gap in the alignment results.
Okay, so a couple of things are sticking out to me. First, are you putting the string "sequence length" in your phylip file? If so, it needs to be a numeric value representing the actual sequence length in the input data. Second, in your example, the order of the individuals in the map file don't match the order in the data file: Individual A111 is 3rd in the data file but is 7th in the map file. The order of individuals needs to be the same between the two files.
I don't think gaps should be an issue because HyDe should be able to handle them
The "sequence length" has been masked, in the real data this is the number :) I will change the order and have a try. Since I get the results I will tell you soon.
Hi @pblischak, After changed the map file's individual order, I got the theoretical true results which the triploid species generated by the cross-breeding by the diploid species and tetraploid species.
But the gamma value seems to 0.7764855770352849 that the diploid proved nearly 78% contents? But this number is supposed to be 33%.
And I also tried the bootstrap_hyde.py with reps of 100. The gamma value could be range from 0.98 to 0.33.
Could you please give me some suggestion?
Hi Paul, Amazing software!
I have several questions about the HyDe. I see the input data should be diploid data or ambiguous sites. But here I have a diploid species and tetraploid species, and through hybridization we got a triploid species.
I have assembled the phased genome of the three species. After whole genome alignment, I attend to test if the triploid species indeed generated by the cross-breeding by the diploid species and tetraploid species.
Could I use the all phased genome alignments as input? For the map file, diploid may contain hap1 and hap2; the triploid may contain hap1, hap2 and hap3, et al. It is correct to perform the analysis?