zhangrengang / SubPhaser

Phase, partition and visualize subgenomes of a neoallopolyploid or hybrid based on the subgenome-specific repetitive kmers.
https://doi.org/10.1111/nph.18173
GNU General Public License v3.0
52 stars 12 forks source link

Too few markers #9

Closed smallfishcui closed 1 year ago

smallfishcui commented 1 year ago

Hi,

I am trying to use the SubPhaser to phase the subgenomes of my species. The parental species are unknown and I built a quite good chromosomal assembly with 99% of BUSCOs k13_q100_f2.0.circos.pdf complete. I named the subgenomes after synteny analysis with a close species sorghum bicolor. When I tried to use Subphaser, I managed to phase the subgenomes, but there seems to be very few kmer markers, and no ltr was found- much less than the numbers in your example files. I used parameter of this -k 13 -q 100 -f 2 -disable_ltr Is this result trustworthy? What could be the reason?

thanks, Cui

zhangrengang commented 1 year ago

@smallfishcui Yes, it seems to be too few kmer markers. Do you have try the parameters -q 100 or -q 50? Reducing -k is not always a good idea and -disable_ltr will result in no LTR, isn't it? I prefer that you can provide all the figures. An explanation is that your species is not an extreme allopolyploid or the polyploidy event is too old.

smallfishcui commented 1 year ago

[Uploading k13_q100_f2.0.kmer_pca.pd k13_q100_f2.0.kmer.mat.pdf f…]() I did the LTR searching but no LTR was found the the run just finish incomplete...so I skipped it to get the circos plot. It looks like my species is an allopolyploid of two closely related species, with only a few chromosomes differed. That is probably the reason why there are so few differed Kmer. Here are more plots: I will try longer kmer...the wgs suggests best kmer maybe around 30. i will keep you updated

zhangrengang commented 1 year ago

OK. Do you mean "there is no LTR" or "there is no subgenome-specific LTR"? The later is possible as there are so few subgenome-specific kmers, in which case you can just -disable_ltrtree. I agree with you that it may be an allopolyploid of two closely related species or populations. It looks like the case of Cleistogenes songorica that can be found in our paper. You can try to reduce -q to 20 - 50 to retrieve more differential kmers (but maybe more noisy).

smallfishcui commented 1 year ago

Thank you Rengang for the prompt reply! Yes, exactly as you said, there is no subgenome specific LTR detected. I will take a further look at the case of Cleistogenes songorica, and try different kmers. I will let you know how it goes shortly

best, Cui

smallfishcui commented 1 year ago

Hi Rengang,

I've been trying using several Kmer size and q to analyze the subgenomes, and it seems indeed a similar case with Cleistogenes songorica. Neither longer kmer or lower counts improves the phasing. I guess it is possible that part of the subgenomes collapsed during the assembling process. please see some pics below, they are K13_Q20, K13_Q50,K13_Q100,K30_Q50,K31_Q5,K31_Q100: k13_q20_f2 0 circos k13_q50_f2 0 circos ![k13_q100_f2 0 circos](https://user-images.githubuserc k30_q50_f2 0 circos ontent.com/41078885/198015938-cbd1bb55-533f-4004-81d5-d5903d89023d.png)

![k31_q5_f2 0 circos](https://user-images.github k31_q100_f2 0 circos usercontent.com/41078885/198016021-5f6eac26-ad7f-4e38-9b63-278046e5f11f.png)

zhangrengang commented 1 year ago

@smallfishcui If there are indeed many assembly errors, such as switch errors between subgenomes, they should be first corrected.

smallfishcui commented 1 year ago

Yes, that's my aim to use subphaser. At first I don't know which chromosomes belong to one subgenome, and I kind of sorted it out after using subphaser for multiple times - although the results are not perfect it is also expected. However, this is not a problem of your program, and it really helped a lot already. I will proceed with downstream analysis and get back to you if there is further questions. Thanks!

best, Cui