Open sjackman opened 8 years ago
I haven't seen that one before - did you use the -mean
setting for DistanceEst?
I did at first, and I had suspected that's what had gone wrong. But, no, I repeated the estimates with --mle
and got the same error.
I'm at a bit of a loss - could you send the .scaf file and .asqg.gz (or contig fasta, if the error appears with that input too). I won't be able to look until next week though.
Here you go:
sga scaffold2fasta -m 500 --write-unplaced --use-overlap -o hsapiens-scaffolds.fa -a hsapiens-graph.asqg.gz hsapiens.scaf
Any chance you've had some time to look into this issue, Jared?
Sorry, no. My time is short these days I'm afraid. Is this blocking you from something?
No worries. I am sympathetic to time constraints. We wanted to include SGA in an assembly comparison of GIAB HG004. SGA will be included in the comparison of contigs. Including scaffold results hinges on this issue.
What's the format of the .scaf
file? Perhaps I could use MergeContigs
from ABySS as a hack work around to create the scaffolds FASTA file.
contig-6181580 contig-4954829,228,151.2,0,0,D contig-197051,2178,122.2,0,1,D
The format is:
std::ostream& operator<<(std::ostream& out, const ScaffoldLink& link)
{
out << link.endpointID << "," << link.distance << "," << link.stdDev << ","
<< link.edgeData.getDir() << "," << link.edgeData.getComp() << "," << link.getTypeCode();
return out;
}
https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldLink.cpp#L118
Could I convert this to an ABySS path file like so, or is it not quite so simple? How do I convert the Dir
and Comp
components to a +
/ -
orientation?
1 contig-6181580+ 100N contig-4954829+ 100N contig-197051-
Is the .scaf
file the final list of which contigs go in which scaffolds with one scaffold per line, or is it a list of distance estimates that still need to be merged into larger scaffolds?
The former - it is the final list of contigs that go into scaffolds.
Can you give me a brief description of how to convert a SGA scaf
file to an ABySS path
file? If you are able to convert the following .scaf
record manually, I can figure it out and write the converter.
contig-1297628 contig-1106169,-24,25.3,0,1,D contig-4496238,-47,17.3,1,1,D contig-2839914,-92,242.9,0,1,D contig-1247260,-62,97.3,1,1,D contig-2437944,3840,177,0,0,D contig-368915,112,186.4,0,1,D contig-4976804,-306,22.8,1,0,D contig-3455841,371,216.8,1,1,D contig-3969805,201,94.5,0,0,D contig-742829,-92,135.6,0,1,D contig-3111798,11,28.9,1,1,D contig-1750901,-79,27.5,0,0,D
Uses cases like this one are why I so badly want to a GFA standard! It would be great if scaffold2fasta
and MergeContigs
accepted the same file formats and were interchangeable. Please do comment on the proposed GFA 2 spec if you find the time. The format of the path record is one of the file items to nail down. https://github.com/GFA-spec/GFA-spec/pull/48
Hi Shaun,
Agreed about GFA2.
I describe the file format here:
The orientation bit describes whether the scaffold should be built left-to-right (the first contig in the record is the first contig in the scaffold) or right-to-left (the first contig is the last record). See here:
https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldRecord.cpp#L45
I suggest you don't spend much time on this - SGA is no longer under active development and there are better short read assemblers out there.
Sorry I can't help more!
Jared
No worries, Jared. Quick comments are helpful. We're including SGA in this comparison to ABySS 2.0 due to its low memory usage. We're also comparing to DISCOVARdenovo, and it yields great contiguity, but takes a lot of memory and time.
From the code it appears that only the direction bit of the first link matters, and the rest are ignored. Is that correct? https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldRecord.cpp#L60
Yes I believe that is correct but verify :)
On Wed, Nov 2, 2016 at 1:48 PM, Shaun Jackman notifications@github.com wrote:
No worries, Jared. Quick comments are helpful. We're including SGA in this comparison due to its low memory usage.
From the code it appears that only the direction bit of the first link matters, and the rest are ignored. Is that correct? https://github.com/jts/sga/blob/master/src/Scaffold/ScaffoldRecord.cpp#L60
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jts/sga/issues/125#issuecomment-257943999, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXxn0Uo125sAzXYLlyxrrUIKqEG5onfks5q6MzWgaJpZM4JYrqM .
Here's a hacktastic script to convert a SGA .scaf
file to an ABySS .path
file. Thanks for your help, Jared.
#!/bin/sh
set -eu
exec gawk -e '
!/\t/ { next }
{
# Determine whether the contigs are in reverse order.
reverse = /^[^\t]*\t[^\t]*,1,[01],D/
# Orient the contigs.
$1 = $1 "+"
rc = 0
for (i = 2; i <= NF; ++i) {
if ($i ~ /,1,D/)
rc = !rc
sub(/,.*/, rc ? "-" : "+", $i)
}
# Print the contig IDs and orientations.
printf "%u", id++
if (reverse) {
for (i = NF; i > 0; --i)
printf " %s", $i
printf "\n"
} else {
print " " $0
}
}' "$@" \
| gsed -e 's/ /\t/;s/ / 199N /g'
Great, thanks for the script.
Any suggestions?