Closed Tang-pro closed 3 months ago
Hi, homologous isoforms are meant for the same gene with different transcripts originated from alternative splicing. Homologous isoforms are supposed to be sequentially highly similar, typically manifested as the ~0 Ks bar in a conventional Ks plot. Usually I use CD-HIT
to drop isoforms before making Ks distribution for transcriptome assembly, by which you may also try different cut-offs to identify isoforms. Note that this question is not about the software wgd v2
itself but the data preparation. Another way to identify isoforms is to simply use the clustering results from wgd dmd
and the resultant diamond hit table to perform similar filtering as CD-HIT
does, for instance, you can filter out transcripts with normalized similiarity scores higher than 0.95 compared to other members in the same cluster (i.e., the deduced gene family) while retain only the longest one. This way you can achieve the same job as CD-HIT
while using less gene length-biased similiarity scores.
Hi, @heche-psb Here I want to identify the conserved isoforms of two species. If cd-hit clustering is used, the differences in isoforms cannot be reflected. So I want to use software specifically designed to identify alternative splicing to extract the different isoform sequences of each gene. I have a question here. Isn't WGD itself also an alignment of gene sequences? If I use these isoform sequence alignments, is this solution feasible? It is difficult to compare Isoforms within species, but what about between species? Is it feasible to compare Isoforms between two species? Thank you!
Hi, in view of "the conserved isoforms of two species", you can achieve it by two means. The first is with 2 steps, 1) identifying the isoforms per species + 2) comparing the obtained isoforms between the two species. The second is to jointly identify the isoforms per species and conserved isoforms between the two species based on the sequence clustering result and similarity matrix. wgd v2
is not specifically designed for this purpose. But you may have a try to calculate the gene length-normalized similarity scores first and then write some custom scripts to retrieve the the conserved isoforms.
Hi, @heche-psb
Thank you so much for taking the time to reply, it means a lot to me.
Hi, @heche-psb
Generally, homologous gene pairs are identified in comparative genomes, but here I use the full-length transcriptome to identify homologous isoforms of two species. Is it possible to do this?
Best wishes!