nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

/venv/lib/python3.8/site-packages/funannotate/aux_scripts/filterIntronsFindStrand.pl FASTA parsing failure in Docker 1.8.16 #990

Closed MichaelFokinNZ closed 5 months ago

MichaelFokinNZ commented 6 months ago

Very strange, but FASTA parsing into %annos hash doesn't work properly in /venv/lib/python3.8/site-packages/funannotate/aux_scripts/filterIntronsFindStrand.pl So introns are not allocated correctly and "1" values assigned for all and warning triggered WARNING: '$seqname' does not match any sequence in the fasta file. Maybe the two files do not belong together

Works flawlessly when FASTA parser substituted with just custom-written parser as below

open (FASTA, "<".$genome) or die "Cannot open file: $genome\n";
while(<FASTA>) {
    chomp;
    if (/^>(\S+)/) {
        $seqname = $1;
    }
    elsif (/^(\S+)/) {
        $annos{$seqname} .= $1;
    }
}
close(FASTA) or die("Could not close fasta file $genome!\n");
hyphaltip commented 6 months ago

That's an augustus script not funannotate tho: https://github.com/Gaius-Augustus/BRAKER/blob/master/scripts/filterIntronsFindStrand.pl

MichaelFokinNZ commented 6 months ago

@hyphaltip and it even more strange :( sorry, not very clear - are those introns passed to augustus in funannotate? desperately trying to get RNA-seq to annotation, but CodingQuarry also not working :(

PS. I am not fluent in perl and regex but apparently this part didn't handle my fasta (tried few) properly in the original script...

open (FASTA, "<".$genome) or die "Cannot open file: $genome\n";
**$/="\n>";**
while(<FASTA>) {
  /[>]*(.*)\n/;
  $seqname = $1;
  $seq = $';
  $seq =~ s/>//; 
  $seq =~ s/\n//g;
  $annos{$seqname} = $seq;
}
hyphaltip commented 6 months ago

I'm really not sure. I run it all the time without errors so I don't know what's the issue here. It would be helpful if you provided the test case that reproduces problem.

MichaelFokinNZ commented 5 months ago

ignore that. windows end line symbol sneaked to contig names in fasta file