gt1 / daccord

d'accord is a non hybrid long read consensus program based on local de Bruijn graph assembly
Other
19 stars 1 forks source link

some questions about the usage of 'split_dis' #6

Open bitcometz opened 6 years ago

bitcometz commented 6 years ago

Hello,

I think 'split_dis' is a great idea and I want to use it to help me get better genome assembly, for example, to generate more complete repeat region, but I have some question about the usage of 'split_dis':

I know cons.fasta contains the "corrected" fasta. But I am not sure: the reads of "in.las" or "in.db" refer to "corrected" reads or "raw(uncorrected)" reads?

Thanks!

split_dis split_dis performs disagreement based read pile splitting for haplotypes and repeats. It expects four arguments

out.las: the name of the output file, which will be written in the LAS file format cons.fasta: a read consensus file for the reads in the input database in FastA format as produced by daccord in.las: alignments for in.db as generated by DALIGNER in.db: input read database

bitcometz commented 6 years ago

Hello,

And I used the daccord corrected reads as input: in.fasta to run daligner to generate raw_data.1.las and use this alignment to run with daccord to generate preads.1.fasta Then I try to run split_dis:

bin/split_dis -t5 -d30 -D200 out1.las ./preads.1.fasta ./raw_data.1.las ./raw_data.dam

and stop in the log file like this and it continue to run without outputting more information:

DC[186]=0 1333 DC[187]=0 1270 DC[188]=0 1270 DC[189]=0 1331 DC[190]=0 9393 DC[191]=0 1270 DC[192]=0 9094 DC[193]=0 1413 DC[194]=0 8424 DC[195]=0 6801 DC[196]=0 1331 DC[197]=0 1283 DC[198]=0 1331 [V] keep 9 106;120;146;150;188;230;239;315;321;340;370;444;447;459;465;511;574;575;636;654;762;763;767;791;798;881;905;910;913;969;983; [V] drop 9 [V] read id 9 time 77478400

bitcometz commented 6 years ago

And I try this: bin/computeextrinsicqv in.fasta preads.1.fasta raw_data.dam 1

the memory surge up to 100G. It was hard to believe because the fasta is only 33Mbp.

gt1 commented 6 years ago

Hi,

concerning computeextrinsicqv: I forgot to update the documentation concerning the arguments expected. This is updated now, please try again with the most recent README.md hints.

about split_dis: this is so far mainly a proof of concept program. While it works as a general idea, it does not really cope yet with high depth or a large number of repeat instances. I have used it to separate up to 7 or 8 copies at depth 20 each, but this is already pretty slow. Anything more will probably take forever with the current implementation.

Best, German

bitcometz commented 6 years ago

Thanks very much for your help