mummer4 / mummer

Mummer alignment tool
Artistic License 2.0
464 stars 107 forks source link

Promer error #55

Open robinohm opened 6 years ago

robinohm commented 6 years ago

I'm trying mummer4 (4.0.0.beta2), but I'm running into a problem with promer.

$ promer --mum reference.fasta query.fasta 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS 4: FINISHING DATA scaffold_0002 ERROR: Could not parse input from 'Query File'. Please check the filename and format, or file a bug report ERROR: postpro returned non-zero

Nucmer and mummer don't give any errors with these files. Promer version 3 also works on these files, so as far as I can tell the fasta files are fine. This error happens on several machines with a variety of fasta files. Am I doing something wrong? Your advice would be appreciated!

gmarcais commented 6 years ago

Sadly, promer has not seen as much "love" as the rest of the programs. It is still the old Perl script.

Can you send me two smal fasta files that do not work?

robinohm commented 6 years ago

Thanks for looking into this. I have attached two fasta files of yeast genomes that result in the error mentioned above (except with a different scaffold name). (https://github.com/mummer4/mummer/files/1780321/fasta_files.zip)

liyun831229 commented 6 years ago

Hi, I have the same problem, I used the example fasta files downloaded from Mummer 3 manual page (http://mummer.sourceforge.net/examples/#promer) .

Thanks, Yun

evolgenomology commented 6 years ago

Hi there, I am seeing the same. Would be great if you could take a look at the issue.

Cheers

Phil

rdejonge commented 6 years ago

Dear @gmarcais . I can confirm this error on multiple systems as well. From some initial testing with multiple different genome combinations it appears to be an issue of multi-fasta files versus single-fasta files. Does that help?

rdejonge commented 6 years ago

Dear @gmarcais . I can now confirm this issue arises from an unsorted output in the postpro input mgaps file. Sorting these (by fasta header, whilst retaining structure) resolved the problem (for me at least). Hope this helps. Regards.

ishengtsai commented 6 years ago

Hi there, is this solved please? I am encountering the same problem. It would be great if someone could post a solution please?

aloy3c commented 6 years ago

Me too, I encountered the same problem. No problem with Nucmer with same input files. But promer brings up the error message

minky commented 6 years ago

I also encountered this and I partly solved the problem as suggested by rdejonge . It seemed that error occurred when the postpro was reading a fasta sequence with header format that might be unaccepted by this program. Next time, if this occurs, maybe just rename all the header to keep them uniform.

ATPs commented 6 years ago

how to fix it? some details? @rdejonge

ATPs commented 6 years ago

the old version works. MUMmer3

jerviedog commented 5 years ago

According to rdejonge's advice, I write a sub sort_mgaps in promer that sorts mgaps file. I call the sub before postpro. This appears to work for me.

Add a line after line 348:

sort_mgaps("$qry_file","$pfx.mgaps");

the sub looks like:

#-- END OF SCRIPT
sub sort_mgaps {
    # Input, file names, strings
    my $qry = shift;
    my $mgaps = shift;
    # read query file
    my @qry_entries = ();
    my $fh;
    open($fh, "$qry") || die "couldn't read file $qry\n";
    while(my $line = <$fh>) {
    chomp $line;
    next unless $line=~/^\>(\S+)/;
    push(@qry_entries,$1);
    }
    close($fh);
    # read mgaps file
    my %mgaps_lines = ();
    my %dna_pep = (); # used to look up the peptides in a dna seq
    open($fh, "$mgaps") || die "couldn't read file $mgaps\n";
    my $pos = 0;
    my $pos_old = $pos;
    my $cur_entry = ""; # e.g. 3210101.5
    while(1) {
    my $line=<$fh>;
    last unless defined $line;
    chomp $line;
    if($line=~/^\>\s*(\S+)/) {
        $cur_entry = $1;
        # find out the corresponding dna id in query file
        die "wrong format $cur_entry\n"
        unless $cur_entry=~/^(.+)\.[\d+]$/;
        my $dna = $1;
        if(exists($dna_pep{$dna})) {
        push(@{$dna_pep{$dna}},$cur_entry);
        }
        else {
        $dna_pep{$dna} = [$cur_entry];
        }
        # find out the lines corresponding to the current entry
        if(exists($mgaps_lines{$cur_entry})) {
        push(@{$mgaps_lines{$cur_entry}},"$line\n");
        }
        else {
        $mgaps_lines{$cur_entry} = ["$line\n"];
        }
    }
    else {
        push(@{$mgaps_lines{$cur_entry}},"$line\n");
    }
    }
    close($fh);
    # sort the query file
    open($fh,">$mgaps") || die "couldn't write to file $mgaps\n";
    my $oldfh = select($fh);
    foreach my $dna (@qry_entries) {
    foreach my $pep (@{$dna_pep{$dna}}) {
        my $ref_line = $mgaps_lines{$pep};
        foreach my $line(@{$ref_line}) {
        print $line;
        }
    }
    }
    select($oldfh);
    close($fh);
}
robinohm commented 5 years ago

Thanks @jerviedog! I can confirm that this patch worked for me

hermannschwaerzlerUIBK commented 5 years ago

Thank you from me as well, @jerviedog! I too can confirm that this works.

@jerviedog: are you planning to create a pull request? In case you lack time to do this and if you don't mind I can create one...

jerviedog commented 5 years ago

@hermannschwaerzlerUIBK Do please pull a request if that helps. BTW, I actually don't know how to pull a request.

2iut128374gb21563 commented 3 years ago

In my case, the query file was in DOS End of Line format (CRLF). Removing the CR to get Unix format (LF) solved the problem for me. Try it with e.g. bash command: tr -d '\15\32' < winfile.txt > unixfile.txt

ankeetkumar commented 1 year ago

filename=filename.fasta (create an environment variable) Then,

dos2unix $filename It will help

Ran my command ERROR: Could not parse input from 'Query File'. Please check the filename and format, or file a bug report ERROR: postnuc returned non-zero ianke@DESKT:/mnt/d/Projects/Ca$ dos2unix $input dos2unix: converting file referencefasta.fasta to Unix format...

Ran my command 4: FINISHING DATA

margaretc-ho commented 7 months ago

The above fix from @jerviedog should be added to the develop branch too @gmarcais please! dos2unix was not sufficient for me to fix the issue. I think this is yet another time I will need to edit the MUMMER sourcecode and recompile to get it to work :( and it is getting very maddening