Open robinohm opened 6 years ago
Sadly, promer has not seen as much "love" as the rest of the programs. It is still the old Perl script.
Can you send me two smal fasta files that do not work?
Thanks for looking into this. I have attached two fasta files of yeast genomes that result in the error mentioned above (except with a different scaffold name). (https://github.com/mummer4/mummer/files/1780321/fasta_files.zip)
Hi, I have the same problem, I used the example fasta files downloaded from Mummer 3 manual page (http://mummer.sourceforge.net/examples/#promer) .
Thanks, Yun
Hi there, I am seeing the same. Would be great if you could take a look at the issue.
Cheers
Phil
Dear @gmarcais . I can confirm this error on multiple systems as well. From some initial testing with multiple different genome combinations it appears to be an issue of multi-fasta files versus single-fasta files. Does that help?
Dear @gmarcais . I can now confirm this issue arises from an unsorted output in the postpro input mgaps file. Sorting these (by fasta header, whilst retaining structure) resolved the problem (for me at least). Hope this helps. Regards.
Hi there, is this solved please? I am encountering the same problem. It would be great if someone could post a solution please?
Me too, I encountered the same problem. No problem with Nucmer with same input files. But promer brings up the error message
I also encountered this and I partly solved the problem as suggested by rdejonge . It seemed that error occurred when the postpro was reading a fasta sequence with header format that might be unaccepted by this program. Next time, if this occurs, maybe just rename all the header to keep them uniform.
how to fix it? some details? @rdejonge
the old version works. MUMmer3
According to rdejonge's advice, I write a sub sort_mgaps in promer that sorts mgaps file. I call the sub before postpro. This appears to work for me.
Add a line after line 348:
sort_mgaps("$qry_file","$pfx.mgaps");
the sub looks like:
#-- END OF SCRIPT
sub sort_mgaps {
# Input, file names, strings
my $qry = shift;
my $mgaps = shift;
# read query file
my @qry_entries = ();
my $fh;
open($fh, "$qry") || die "couldn't read file $qry\n";
while(my $line = <$fh>) {
chomp $line;
next unless $line=~/^\>(\S+)/;
push(@qry_entries,$1);
}
close($fh);
# read mgaps file
my %mgaps_lines = ();
my %dna_pep = (); # used to look up the peptides in a dna seq
open($fh, "$mgaps") || die "couldn't read file $mgaps\n";
my $pos = 0;
my $pos_old = $pos;
my $cur_entry = ""; # e.g. 3210101.5
while(1) {
my $line=<$fh>;
last unless defined $line;
chomp $line;
if($line=~/^\>\s*(\S+)/) {
$cur_entry = $1;
# find out the corresponding dna id in query file
die "wrong format $cur_entry\n"
unless $cur_entry=~/^(.+)\.[\d+]$/;
my $dna = $1;
if(exists($dna_pep{$dna})) {
push(@{$dna_pep{$dna}},$cur_entry);
}
else {
$dna_pep{$dna} = [$cur_entry];
}
# find out the lines corresponding to the current entry
if(exists($mgaps_lines{$cur_entry})) {
push(@{$mgaps_lines{$cur_entry}},"$line\n");
}
else {
$mgaps_lines{$cur_entry} = ["$line\n"];
}
}
else {
push(@{$mgaps_lines{$cur_entry}},"$line\n");
}
}
close($fh);
# sort the query file
open($fh,">$mgaps") || die "couldn't write to file $mgaps\n";
my $oldfh = select($fh);
foreach my $dna (@qry_entries) {
foreach my $pep (@{$dna_pep{$dna}}) {
my $ref_line = $mgaps_lines{$pep};
foreach my $line(@{$ref_line}) {
print $line;
}
}
}
select($oldfh);
close($fh);
}
Thanks @jerviedog! I can confirm that this patch worked for me
Thank you from me as well, @jerviedog! I too can confirm that this works.
@jerviedog: are you planning to create a pull request? In case you lack time to do this and if you don't mind I can create one...
@hermannschwaerzlerUIBK Do please pull a request if that helps. BTW, I actually don't know how to pull a request.
In my case, the query file was in DOS End of Line format (CRLF).
Removing the CR to get Unix format (LF) solved the problem for me.
Try it with e.g. bash command:
tr -d '\15\32' < winfile.txt > unixfile.txt
filename=filename.fasta (create an environment variable) Then,
dos2unix $filename It will help
Ran my command ERROR: Could not parse input from 'Query File'. Please check the filename and format, or file a bug report ERROR: postnuc returned non-zero ianke@DESKT:/mnt/d/Projects/Ca$ dos2unix $input dos2unix: converting file referencefasta.fasta to Unix format...
Ran my command 4: FINISHING DATA
The above fix from @jerviedog should be added to the develop branch too @gmarcais please! dos2unix was not sufficient for me to fix the issue. I think this is yet another time I will need to edit the MUMMER sourcecode and recompile to get it to work :( and it is getting very maddening
I'm trying mummer4 (4.0.0.beta2), but I'm running into a problem with promer.
$ promer --mum reference.fasta query.fasta 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS 4: FINISHING DATA scaffold_0002 ERROR: Could not parse input from 'Query File'. Please check the filename and format, or file a bug report ERROR: postpro returned non-zero
Nucmer and mummer don't give any errors with these files. Promer version 3 also works on these files, so as far as I can tell the fasta files are fine. This error happens on several machines with a variety of fasta files. Am I doing something wrong? Your advice would be appreciated!