Closed pb-cdunn closed 8 years ago
A file may contain the data from multiple SMRT cells provided the reads for each SMRT cell are consecutive in the file.
Ugh! Is that the problem? I was happyto learn that fasta2DB would finally handle a mixture of movies within a fasta input, so that I could drop our pre-processing. But it doesn't actually solve the problem. We still need to pre-process.
What if dexta re-ordered the reads for us, so that SMRT cells are consecutive?
Yes, that is the problem. I don't understand how in the world you got your headers are all mixed up. Presumably you initially extracted information from bax.hd5 files and headers are at that time consecutive. I can see someone then concatenating several bax extracted fastas together into a single file which is what I was anticipating with the new version. But I did not expect someone to effectively shuffle them and don't see why or how that is necessary or useful. Please explain to me why your reads are a complete scramble from many different SMRT cells.
If you insist on having them totally scrambled then use fasta2DAM. The only thing you will loose is the knowledge of whether or not two reads come from the same well -- but given that your reads are a complete scramble I presume you don't care about that anyway ;-)
-- Gene
On 6/11/16, 8:00 PM, Christopher Dunn wrote:
https://dazzlerblog.wordpress.com/command-guides/dazz_db-command-guide/
A file may contain the data from multiple SMRT cells provided the reads for each SMRT cell are consecutive in the file.
Ugh! Is that the problem? I was happyto learn that fasta2DB would finally handle a mixture of movies within a fasta input, so that I could drop our pre-processing. But it doesn't actually solve the problem. We still need to pre-process.
What if dexta re-ordered the reads for us, so that SMRT cells are consecutive?
- https://dazzlerblog.wordpress.com/command-guides/dextractor-command-guide/#dextract
- https://github.com/thegenemyers/DEXTRACTOR
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/21#issuecomment-225380682, or mute the thread https://github.com/notifications/unsubscribe/AGkkNiL2D0TiCy3Nmym9QNAkuUgQuL9nks5qKvfHgaJpZM4Izmga.
LOL. Now that you mention it, that makes sense to me. I'll talk to the folks who gave me these data. Closing this Issue.
I run this:
I find that
orig.db
keeps growing dramatically, with many lines for each fasta like this:Is this expected?