thegenemyers / DALIGNER

Find all significant local alignments between reads
Other
139 stars 61 forks source link

How to get more sensitive result? #34

Closed mudesheng closed 8 years ago

mudesheng commented 8 years ago

Hi Gene, When I use daligner align Pacbio to the DeBruijn Graph from Illumina Reads, where need more sensitive alignments cover graph path, But I found some graph edge is missing, my script is "PCmapper -k15 -l100 -w7 -h20 K203 head2M >K203.sh". How can I set parameters get more sensitive alignment. Tks

Desheng

thegenemyers commented 8 years ago

The parameters are already very low. I'm worried about -l100, with 10Kbp+ reads its not clear to me why you are looking for 100bp matches. daligner was not designed for this. My guess is you have DeBruijn graph edges that are this short? I think a better strategy when you have very short graph edges would be to enumerate several paths giving a longer string and see which one gets mapped. A 100bp match at 85% is not terribly significant.

The only thing I would suggest is reducing -k to 12, 13, or 14 although with those parameters it will be slow. I would also guess that -h could actually be a bit higher without compromising senstitivity (e.g. 30 or 35).

-- Gene

On 12/9/15, 11:16 AM, mds wrote:

Hi Gene, When I use daligner align Pacbio to the DeBruijn Graph from Illumina Reads, where need more sensitive alignments cover graph path, But I found some graph edge is missing, my script is "PCmapper -k15 -l100 -w7 -h20 K203 head2M >K203.sh". How can I set parameters get more sensitive alignment. Tks

Desheng

— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/34.

mudesheng commented 8 years ago

Thank you very much, Use Kmer=201 construct DeBruijn Graph, get some edges smaller than 210bp, so previous edge and conjunctive edge overlap 201bp, differ bases <10bp, and maybe concurrence some very low quality region, need allow 150~200bp matches for this case.

enumerate several paths is a good solution, but I'm afraid high-copy number repeat edges blow up.

Desheng