hillerlab / TOGA

TOGA (Tool to infer Orthologs from Genome Alignments): implements a novel paradigm to infer orthologous genes. TOGA integrates gene annotation, inferring orthologs and classifying genes as intact or lost.
MIT License
152 stars 23 forks source link

The different results of the same gene in two assemblies #171

Closed txfc closed 2 months ago

txfc commented 3 months ago

Hi! Authors I use chicken as a reference to annotate other bird's genomes. However, I get different results of the same gene in two assemblies of the same species. The gene is ZFAND2A, and this is the result of the inactivated situation in GCA_013398195.1. This assembly is more fragmented. TOGA assesses the first exon is deleted, and this gene is 'L' image

The same gene in GCA_036013445.1 image It is a different result at the first exon. The gene is 'UL' in GCF_036013445.1 ZFAND2A has been annotated in the GTF of GCF_036013445.1 , while exons do not align any reads by RNA-seq data. image After aligning the gene between the two assemblies, I do not find any large difference in the sequence of the first exon. However, I noticed that maybe there is a different soft-mask situation in the two assemblies. Why is there a different result for the same gene?

Looking forward to your reply. I would appreciate it very much. Best regards!

MichaelHiller commented 3 months ago

Hi,

if I understand it right, the question is about this exon image which is partially deleted in GCA_013398195 as shown by the chains. The chains show that in in the orange locus there is no assembly gap, therefore TOGA counts the del of exon 1 (of course it partially aligns, kind of a border line case).

I don't have the alignment to the VGP assembly, so I can't look at the chains. But if the seq around this first exon is identical between the old and VGP assembly, then yes, the difference in softmasking could explain why lastz does not find alignments in the old assembly. However, in the old assembly, the softmasked region is ~420 bp upstream of this exon, so I don't think it is a masking issue.

Are you using the same TOGA version (always 1.0 or 1.1.4)?

In any case, given the other FS, this gene looks lost to me.

txfc commented 3 months ago

Thank your reply! Yes, the TOGA version I use is the same and it is the latest version(1.1.7).

MichaelHiller commented 3 months ago

OK, then different masking could be the reason. But again, the gene is likely lost (I think). Importantly, it can still be expressed (producing a non-coding RNA). See Hecker 2019 for examples

txfc commented 3 months ago

Thanks for your reply Best regards!