gt1 / daccord

d'accord is a non hybrid long read consensus program based on local de Bruijn graph assembly
Other
19 stars 1 forks source link

daccord error correction #3

Open Mkchouk opened 7 years ago

Mkchouk commented 7 years ago

Hello, can you present us how we correct long reads using daccord? i must just run./src/daccord reads.las reads.dam ? and how generete reads.las and reads.dam ?

thanks

gt1 commented 7 years ago

As the README.md says, you need to first convert your input FastA file(s) to a dazzler database using fasta2DB or fasta2DAM from DAZZ_DB . Use something like

fasta2DAM reads.dam reads.fasta DBsplit -s256 -x1000 reads.dam

Then run daligner on this database to produce alignments in the LAS format. Try

HPC.daligner reads.dam | bash

You need to have the daligner directory in your PATH for this to work.

Afterwards you can run

src/daccord reads.las reads.dam >reads_daccord.fasta

Mkchouk commented 7 years ago

thank you for your response. so i must install Dazzler databases and DALIGNER for LAS alignment files?? thank you

gt1 commented 7 years ago

Yes, you need DAZZ_DB and DALIGNER.

Mkchouk commented 7 years ago

the HPC.daligner reads.dam | bash generated me three .las files. i run daccord for every file to obtain the corrected reads? src/daccord reads.1.las reads.dam >reads_daccord.1.fasta src/daccord reads.2.las reads.dam >reads_daccord.2.fasta src/daccord reads.3.las reads.dam >reads_daccord.3.fasta thank you for your response

gt1 commented 7 years ago

Yes. Each of the LAS files contains alignments for a block of reads. daccord will output corrected reads for the reads it finds in these files.

Mkchouk commented 7 years ago

Is it possible to clarify the tuning algorithm? There are many mathematical equations and since I am not a mathematician I have a hard time understanding the algorithm generally. thank you

gt1 commented 7 years ago

The program should be just fine with it's default settings. If you are looking at a repetitive genome, you may consider setting the -D parameter to twice the average sequencing depth, i.e. use -D60 for a sequencing depth of 30. This will only load the (up to) 60 "best" alignments for each read. You may also consider to run

computeintrinsicqv -d30 reads.db reads.las lasfilteralignments reads.db reads.las

which will create reads_filtered.las . Here the -d switch needs to be set to the average sequencing depth. Afterwards pass reads_filtered.las to daccord.

Mkchouk commented 7 years ago

No, It's not my question. My question is to clarify the daccord algorithm, How does daccord do the long reads error correction? thanks

gt1 commented 7 years ago

Is there any particular detail you're intersted in? Any particular issues with understanding the paper?

Mkchouk commented 7 years ago

thank you for your reply. Yes. I did not understand how daccord correct long reads. I want to understand the daccord algorithm. Can you detail the algorithm please? thank you

Mkchouk commented 7 years ago

any response please? Thanks

smm19900210 commented 6 years ago

the software only for pacbio data ?

smm19900210 commented 6 years ago

daligner: Block test.2 contains reads < 14bp long ! Run DBsplit what's wrong?

gt1 commented 6 years ago

daccord should work for any kind of data loosely followng it's employed error model of randomly occuring errors.