Closed rbeinart closed 7 years ago
@rbeinart prokka doesn't really do any pseudo-gene detection. this is why i don't put any annotations in the output files about pseudo-genes. I only print them to the log. How my "detection" works is very primitive. If I find two sequential genes with the same /product and the /product is not "hypothetical protein", I print it out. That is ALL I do.
I have written a standalone script for you to use to replicate it. You need to provide it with the .faa
file that Prokka produces:
https://raw.githubusercontent.com/tseemann/bioinfo-scripts/master/bin/prokka-suggest_pseudogenes.pl
I'll also paste it inline below:
#!/usr/bin/env perl
use strict;
@ARGV or die "Usage: $0 <prokka.faa>";
my %ignore = ('hypothetical protein'=>1);
my @gene;
while (<ARGV>) {
next unless m/^>(\S+)\s+(.+)$/;
push @gene, [ $1, $2 ];
}
my $N = scalar(@gene);
print STDERR "Found $N genes.\n";
my $P = 0;
if ($N > 1) {
for my $i (1 .. $N) {
my $prod = $gene[$i-1][1];
if ( !$ignore{$prod} and $gene[$i][1] eq $prod ) {
print "$gene[$i-1][0] & $gene[$i][0] => $prod\n";
$P++;
}
}
}
print STDERR "Found $P potential pseudo-genes\n";
Hi @tseemann,
Please are pseudogene calls that have the same gene name and directly follow each other the same gene which may have been separated by mutation and are therefore called separately?
Secondly, is there anyway of determining which kind of mutations occurred in the pseudogenes prokka called?
Thank you
I'd like to plug a new software that has been designed for pseudogene detection: software and preprint
Pseudofinder accepts genbank format files generated with Prokka (with the --compliant flag). Pseudofinder includes detection of 'fragmented' pseudogenes that appear as 2 or more sequential genes that appear to be derived from a single ancestral gene.
Torsten, thanks for the great software, which I and many others find extremely useful.
Thanks @Arkadiy-Garber
Pseudofinder has been helpful
Hello,
I'm interested in using the pseudogene detection on some gene calls that have already been made in Prodigal. Is that possible? When I run Prokka independently, the gene calls don't match my Prodigal calls exactly. Is there any way to utilize the putative pseudogene detection aspect of Prokka independently?
Best, -Roxanne-