kmer 13 or less gives a lot of broken pipe errors

zhangrengang / SubPhaser

Phase, partition and visualize subgenomes of a neoallopolyploid or hybrid based on the subgenome-specific repetitive kmers.

https://doi.org/10.1111/nph.18173

GNU General Public License v3.0

54 stars 12 forks source link

kmer 13 or less gives a lot of broken pipe errors #33

Open shreena-pradhan opened 5 months ago

shreena-pradhan commented 5 months ago

Thank you for developing this pipeline.

I've noticed that for my allotetraploid species, while -k 15, -k 14 works fine, once I try -k 13 or -k 12, there are a lot of broken pipeline issues. A lot of the underlying python scripts will start having errors.

I was wondering if I could talk to someone about this? Thanks!

zhangrengang commented 5 months ago

For hexaploid wheat, -k 9 to 13 works but 5 or 7 not, because the numbers of subgenome-specific kmers reduced dramaticly from k=13 to k=9, and reduced to 0 when k=7 or 5 (see our paper Fig. S51). So for your species, it is most likely that there may be too few subgenome-specific kmers when k=13 or 12. When subgenome-specific kmers are too few, the subsequent pipeline will be meaningless and raise errors. Please check the numbers of subgenome-specific kmers and differential kmers.

shreena-pradhan commented 5 months ago

Thank you for responding! This helps a lot. I know you've run the analyses on the species I'm talking about (Z japonica) and couldn't phase the genome. When I use k = > 16, I get different results but they are still not completely phased (or divided into sub-genomes properly). What do you think is the reason behind this? Do you think its worth it for me to try other software to see if they will work with Z japonica?

zhangrengang commented 5 months ago

Previously I failed for this species. You may reduce -q to 50 to retrieve more differential kmers. The reason maybe that the progenitors are too closely relative and the differentation is too small when hybridization, or the hybridization event is too ancient (see more explainations). You may post your figures here. Of course it is worth to try other tools. If there are potential diploid ancestor genomes, you may try synteny- and phylogeny-based methods (e.g., here). You can also continue to try other kmer-based methods (a review) . If you finally get a better result, you may share the methods and parameters to our users.

shreena-pradhan commented 4 months ago

Thank you for your insight. I was trying a couple of other pipelines taken from your suggestion. But I also want to try the subgenome phasing with WGDI. What do you suggest the outgroup should be? I'm leaning towards Oropetium since its in the same sub-family but belongs to a different tribe. It is a diploid species.

Also, I'm sorry about communicating via this issues forum (I could email you to pick your brain if you're open to it).

zhangrengang commented 4 months ago

Synteny analyses are needed to say which the outgroup should be. The outgroup is good when it have well synteny with your species and corresponding chromosomal homologous relationships. When there are no good outgroup for reference, you can use one arbitrary subgenome as the referecne for sugenome assignments, and then map the karyotype (via wgdi -km) to a outgroup for rooting the phylogenetic trees. Communicating via this issues forum is okay.