thegenemyers / DALIGNER

Find all significant local alignments between reads
Other
139 stars 61 forks source link

Error: "Block ... contains reads < 20bp long! Run DBsplit." #21

Closed haghshenas closed 8 years ago

haghshenas commented 9 years ago

Hi,

I have been trying to map some simulate pacbio reads (from PBSIM) to the human reference genome hg19. I tried to run commands as it is advised on the website and readme files.

For the reference genome:

$ fasta2DAM ref hg19.fasta
$ DBsplit ref
$ DBdust ref.1
...
$ DBdust ref.14

and for the read:

$ fasta2DB read read.fasta
$ DBsplit read
$ DBdust read.1

Then, I ran HPCmapper to get the commands:

$ HPCmapper ref.dam read.db

Among the daligner commands generated by HPCmapper, the second command gives error (the first one works without any problem):

$ daligner -A -k20 -h50 -e.85 ref.2 read.1
daligner: Block ref.2 contains reads < 20bp long !  Run DBsplit.

But I have already used DBsplit both for the reference genome and the reads. Could you give more information about this error and how I can resolve it?

Thanks!

thegenemyers commented 9 years ago

I believe the "problem" is that there are some very small "contigs" (<20bp) interspersed bewteen N's in hg19.fasta. Call "DBsplit -x100 ref" before applying the mapper and I think all all should then be fine. -- Gene

On 7/15/15, 12:14 AM, haghshenas wrote:

Hi,

I have been trying to map some simulate pacbio reads (from PBSIM) to the human reference genome hg19. I tried to run commands as it is advised on the website and readme files.

For the reference genome:

$ fasta2DAM ref hg19.fasta $ DBsplit ref $ DBdust ref.1 ... $ DBdust ref.14

and for the read:

$ fasta2DB read read.fasta $ DBsplit read $ DBdust read.1

Then, I ran HPCmapper to get the commands:

$ HPCmapper ref.dam read.db

Among the daligner commands generated by HPCmapper, the second command gives error (the first one works without any problem):

$ daligner -A -k20 -h50 -e.85 ref.2 read.1 daligner: Block ref.2 contains reads < 20bp long ! Run DBsplit.

But I have already used DBsplit both for the reference genome and the reads. Could you give more information about this error and how I can resolve it?

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/21.

haghshenas commented 9 years ago

That helped. daligner works now.

Thank you.