jaumlrc / ProphET

10 stars 10 forks source link

Investigate the possibility of taking output of Prokka #11

Open gustavo11 opened 7 years ago

LindseyBohr commented 7 years ago

When I try to use prokka gff output currently, I get an error along the lines of

The transcript XXXX does not seem to have a parent

Is there any way for me to bypass or fix this?

gustavo11 commented 6 years ago

Dear @LindseyBohr, sorry for the delay. I have uploaded a GFF format converter that might work in the conversion of prokka output to the format accepted by ProphET. Please see instruction on ProphET's README.md file. Please tell me in case it doesn't work and I will address the issue using a different strategy.

tseemann commented 6 years ago

@gustavo11 Prokka author here :)
What gene model structure are you expecting in the GFF file? GFF3 or GTF(2.5) ?

gustavo11 commented 6 years ago

We are using the GFF3 format as defined here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

tseemann commented 6 years ago

Bacaterial GFF files do not usually use the full example gene model described in the GFF3 spec. I would suggest you support the Prokka and NCBI style GFF3 files.

nickp60 commented 6 years ago

The gff_rewrite.pl tool works for most Prokka-generated gffs (although some I had to re-run without the --compliant flag, as pipes in the fasta header were being interpreted as pipes in commands, I think around line 167). I tried to add a system call to to the main ProphET_standalone.pl script when parsing the gff failed:

#Processing the input files and separating in one fasta per GFF                                                                                                                                   

# Get the scaffold IDs from the gff                                                                                                                                                               

my $gff_handler = GFFFile::new($gff_in);                                                                                                                                                          
try {                                                                                                                                                                                             
    $gff_handler->read();                                                                                                                                                                         

} catch {                                                                                                                                                                                         
    warn "caught error: $_";                                                                                                                                                                      
    my @args = ("$UTILS_DIR/GFFLib/gff_rewrite.pl", "--input", "$gff_in", "--output", "$oudir/tmp.gff", "--add_missing_features");                                                                
    system(@args) == 0                                                                                                                                                                            
        or die "system @args failed: $?";                                                                                                                                                         
    my $gff_handler = GFFFile::new("$oudir/tmp.gff");                                                                                                                                             
    $gff_handler->read();                                                                                                                                                                         

};                                                                                                                                                                                                
my @scaffold_ids = $gff_handler->get_chrom_names();

but I got the Can't locate GFFFile.pm in @INC error when it tries to run gff_rewrite.pl, despite the use lib "$FindBin::Bin/UTILS.dir/GFFLib" line up top. Any tips on how I could get this sorted so I could submit a pull request?

nilesh-tawari commented 4 years ago

@nickp60 Try exporting path before running the script like 'export PERL5LIB=$UTILS_DIR/GFFLib'