sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Colour-space encoder / decoder #1

Closed gringer closed 13 years ago

gringer commented 13 years ago

I've adjusted the colour-space encoder to work more directly on the input string, propogated those changes down to the loader, and made the colour-space loader a bit more generic (it now supports both csfasta and csfastq formats). Note that this encoding preserves the first base, and outputs in single-encoded colour-space.

Unfortunately, I'm now a bit stuck. While I can test my encoding / decoding, I don't think I can test to see if things are loading into the graph properly, because the encoding I'm using ([ACGT][0-4]+, i.e. single encoding) is different from the one you are using ([ACGT]+, i.e. double encoding).

Anyway, my changes so far are here in this branch, in case you are interested.

sebhtml commented 13 years ago

Hello,

First base should be recorded in structures/Read.

Also, symbols (from {0,1,2,3} or from {A,C,G,T}) are encoded in 2 bits in Ray so I think your changes will likely break support for color-space (because of the N).

In Ray:

'0' and 'A' are 00 '1' and 'C' are 01 '2' and 'G' are 10 '3' and 'T' are 11

Can you test on http://solidsoftwaretools.com/gf/project/ecoli50x50/ to see if it works or fails ?

Thank you.

                                   Sébastien
gringer commented 13 years ago

I see. I'll close this request while I make changes so it works with the structures you have. Putting a first base in structures/Read sounds like it will work for this.