Open bhagya-ct opened 6 years ago
I have run into this problem as well with the fasta output of FALCON - when parsing the fasta file it appears that the function "LoadFasta" does not parse the header lines correctly. Instead of splitting off the contig name (immediately following ">") the variable contig_name is actually the entire header line (without ">"). The following modification of "LoadFasta" does this correctly and I have successfully created the Lachesis Assembly Fasta file with this change. I had not looked at perl code for sometime so this is a workaround, perhaps no the solution the authors might have chosen:
sub LoadFasta( $ ) {
#print localtime() . ": LoadFasta: $_[0]\n";
open IN, '<', $_[0] or die;
my $contig_name;
my @contig_names;
my @A1;
my %contig_seqs;
while (<IN>) {
chomp;
if ( /^\>(.+)/ ) {
$contig_name = $1;
@A1 = split (/ /,$contig_name);
push @contig_names, $A1[0];
}
else {
@A1 = split (/ /,$contig_name);
$contig_seqs{$A1[0]} .= $_;
}
}
close IN;
die "ERROR: LoadFasta: Couldn't parse file $_[0] properly. Are you sure this is a FASTA file?" unless scalar @contig_names >= 1 && scalar keys %contig_seqs >= 1;
return ( \@contig_names, \%contig_seqs );
}
I hope this helps!
@pwmcclurg,
thank you for your reply, I could fix the error with the help of my friend and extracted .FASTA file.
mml@mml:/media/mml/6f60ef75-45fb-4532-9f2a-1a5d642a3093/3C_data/Ctrp_WT$ CreateScaffoldedFasta.pl PacBio_denovo.fasta out Wed Apr 25 14:14:42 2018: CreateScaffoldedFasta.pl with input fasta = PacBio_denovo.fasta, OUTPUT_DIR = out Wed Apr 25 14:14:42 2018: Found 7 ordering files ('group*.ordering' in out/main_results/). Wed Apr 25 14:14:42 2018: Reading in sequences from assembly file PacBio_denovo.fasta Wed Apr 25 14:14:42 2018: Found 141 contigs/scaffolds in assembly. ERROR: Ordering file out/main_results/group0.ordering includes contig named 'tig00000015', not found in fasta file PacBio_denovo.fasta Wed Apr 25 14:14:42 2018: Creating a scaffold from file out/main_results/group0.ordering...
But, PacBio_denovo.fasta does contain tig00000015.
Unable to figure out how to fix this.
Bhagya C T