shendurelab / LACHESIS

The LACHESIS software, as described in Nature Biotechnology (http://dx.doi.org/10.1038/nbt.2727)
Other
76 stars 32 forks source link

CreateScaffoldedFasta.pl error #46

Open bhagya-ct opened 6 years ago

bhagya-ct commented 6 years ago

mml@mml:/media/mml/6f60ef75-45fb-4532-9f2a-1a5d642a3093/3C_data/Ctrp_WT$ CreateScaffoldedFasta.pl PacBio_denovo.fasta out Wed Apr 25 14:14:42 2018: CreateScaffoldedFasta.pl with input fasta = PacBio_denovo.fasta, OUTPUT_DIR = out Wed Apr 25 14:14:42 2018: Found 7 ordering files ('group*.ordering' in out/main_results/). Wed Apr 25 14:14:42 2018: Reading in sequences from assembly file PacBio_denovo.fasta Wed Apr 25 14:14:42 2018: Found 141 contigs/scaffolds in assembly. ERROR: Ordering file out/main_results/group0.ordering includes contig named 'tig00000015', not found in fasta file PacBio_denovo.fasta Wed Apr 25 14:14:42 2018: Creating a scaffold from file out/main_results/group0.ordering...

But, PacBio_denovo.fasta does contain tig00000015.

Unable to figure out how to fix this.

Bhagya C T

phillip-mcclurg-driscolls commented 6 years ago

I have run into this problem as well with the fasta output of FALCON - when parsing the fasta file it appears that the function "LoadFasta" does not parse the header lines correctly. Instead of splitting off the contig name (immediately following ">") the variable contig_name is actually the entire header line (without ">"). The following modification of "LoadFasta" does this correctly and I have successfully created the Lachesis Assembly Fasta file with this change. I had not looked at perl code for sometime so this is a workaround, perhaps no the solution the authors might have chosen:

LoadFasta: Convert a fasta file to contigs.

Outputs:

1. An array of contig names.

2. A hash of contig name to contig sequence.

sub LoadFasta( $ ) {

#print localtime() . ": LoadFasta: $_[0]\n";

open IN, '<', $_[0] or die;

my $contig_name;
my @contig_names;
my @A1;
my %contig_seqs;
while (<IN>) {
    chomp;
    if ( /^\>(.+)/ ) {
        $contig_name = $1;
        @A1 = split (/ /,$contig_name);
        push @contig_names, $A1[0];
    }
    else {
        @A1 = split (/ /,$contig_name);
        $contig_seqs{$A1[0]} .= $_;
    }
}

close IN;

die "ERROR: LoadFasta: Couldn't parse file $_[0] properly.  Are you sure this is a FASTA file?" unless scalar @contig_names >= 1 && scalar keys %contig_seqs >= 1;

return ( \@contig_names, \%contig_seqs );

}

I hope this helps!

bhagya-ct commented 6 years ago

@pwmcclurg,

thank you for your reply, I could fix the error with the help of my friend and extracted .FASTA file.