Hi there,
I've been running some annotation tests on DFAST for a collection of MAGs and I noticed that in some cases, the a huge number of partial pseudogenes being detected, sometimes close to 20% of all called CDS! Most of the MAG collection don't belong to any known species or genus (even within the GTDB)... so I tested one MAG with a known genus adding just two known relatives and the number of partial genes halved!! (589 vs 242 partial genes).
Despite being MAGs, the portion of partial genes at the ends of the contigs is relatively small (16 in this specific genome).
Is there a way to separate the detection of pseudogenes due to frameshifts/internal stop codons from the partial genes? I ask because the pseudogene detection appears as a single process in the config file. I think that detecting translation exceptions to selenocystein/pyrrolysine or frameshifts might still be globally useful, but the partial gene detection seems a slippery slope when applied to MAGs/genomes of poorly characterised lineages.
Currently, the logic for pseudogene annotation and detection of translation exceptions are closely related with each other. So they are not separatable.
Hi there, I've been running some annotation tests on DFAST for a collection of MAGs and I noticed that in some cases, the a huge number of partial pseudogenes being detected, sometimes close to 20% of all called CDS! Most of the MAG collection don't belong to any known species or genus (even within the GTDB)... so I tested one MAG with a known genus adding just two known relatives and the number of partial genes halved!! (589 vs 242 partial genes). Despite being MAGs, the portion of partial genes at the ends of the contigs is relatively small (16 in this specific genome).
Is there a way to separate the detection of pseudogenes due to frameshifts/internal stop codons from the partial genes? I ask because the pseudogene detection appears as a single process in the config file. I think that detecting translation exceptions to selenocystein/pyrrolysine or frameshifts might still be globally useful, but the partial gene detection seems a slippery slope when applied to MAGs/genomes of poorly characterised lineages.
Cheers,