Closed necrolyte2 closed 9 years ago
There is an option to write an amos file. Otherwise, you can align the reads on the scaffolds and/or contigs.
Thanks @sebhtml I'll look into the amos option and how to interpret that format to extract what we are looking for
An Amos file tells you where each read is located in a contig.
There is a program named sam2amos, see http://bioinformatics.ninja/blog/2013/12/11/genome-scaffolders-suck/ and https://github.com/ctSkennerton/sam2amos
I think that the sam format is more mature and more popular than the amos format.
I suppose that you will want to do operations on ranges and other similar operations.
Sorry to poke on this some more, but we are kind of stuck a bit on this issue again
We are essentially just trying to extract all reads that assembled to each Contig that is in the produced Contig.fasta file by Ray.
We are now using the -amos
flag, which at first seemed like it would get us what we wanted but it seems there are 2 issues with the AMOS.afg file produced by Ray.
{TLE
blocks and thus no reads associated with themD
For reference here is an AMOS.afg file created by running Ray on some of our data
Example Contig block
{CTG
iid:60
eid:contig-11000003
com:
Software: Ray, MPI rank: 3
.
seq:
ACGTCCTGTTCCAAGGAACTTAGGTCGGTGGTTACTCAGAAGCATCCTCTACAAATTACAACTCGGACGTCGAAGACACCAGATTTCAAATTTGAGCTATTGCTGCTTCACTCGCCGTTACTAGAGCAATCCCTGTTGGTTTCTTTTC
.
qlt:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
.
}
Not having the quality is not a huge deal, but we are unsure as how to interpret the {CTG
blocks that do not have any associated reads.
Quality strings only contain D characters because the Ray code does not store read quality into memory. So that is that.
I have not looked at the code that generates the TLE objects in a while. I remember that the code basically visits each vertex of the contig path and thread any read with annotations in the distributed k-mer graph. This code should take care of both strands. Maybe this is another bug lurking in the code.
We are looking to extract the reads that compose the end result contigs and I'm not sure where or how to get these right now since there is no BAM created nor can I find the file that has this info.
Any help is appreciated.