Closed pbjd closed 10 years ago
The dazzler DB currently does not support cycle numbers greater than 2^16 as you have observed. I am surprised that such cycle numbers occur in your .fasta file as even with a 3 hour run I don't think the machine captures that many cycles. Can I ask where the .fasta's came from? How was the machine run?
A better fix would be to simply change the cycle numbers but preserve the length of the interval in fasta2DB. That way the data is not thrown away, just the actual cycle numbers are lost when one tries to recreate the input .fasta's with DB2fasta.
Ultimately I expect this machine to start producing reads of length over 2^16 at which point a number of shorts will have to become ints (and a few things will run a bit slower as a result).
Mods addressed in my branch, thanks!
Ran into this while integrating with HGAP. When casting int's to ushorts with subreads ranges larger than 2^16 corrupts beg/end values in the record buffer. A side effect of this is that the pbid in the DB becomes corrupted, creating a book-keeping issue in HGAP. Here are some examples found in my dataset:
I put in a simple fix to just skip subreads that fall in this category and emit a warning.