thegenemyers / DALIGNER

Find all significant local alignments between reads
Other
138 stars 61 forks source link

Thread setup for lex_sort #79

Closed JEzpeleta closed 6 years ago

JEzpeleta commented 6 years ago

Note: This is not necessarily a bug/issue, but more of a question at this point.

I have made some changes to how tuple lists are built in tuple_thread. These appear to work fine, and everything downstream of tuple_thread works fine when NTHREADS=1. However, when I use multi-threading (I have tried NTHREADS=2 or NTHREADS=4), the lex_sort after tuple building (mersort) fails. Specifically, I get a list that is almost sorted but contains zeros and a few elements at apparently random positions which are not sorted.

The tuple list I am supplying is identical in either case (with NTHREADS=1 or NTHREADS=4), so my current guess is that the issue must be in how I set up the .beg and .end parameters for lex_sort. My current approach is to take the list length (which is NOT equal to block->reads[nreads].boff - Kmer * nreads in my case) and divide that in equal parts (so if NTHREADS=2 and I have a list with 4000000 tuples, I set parmx[0].beg=0, parmx[0].end=2000000, parmx[1].beg=2000000, parmx[1].end=4000000).

In any case, my specific question is whether there are any requirements about how parmx must be set for lex_sort to work correctly. Maybe the program expects the number of elements assigned to each thread to be a multiple of a given number, or something along those lines. Any help or thoughts are welcome.

I hope I have been clear. Thank you in advance for you help.

thegenemyers commented 6 years ago

Unfortunately the interface to the lexicographical sort is not particularly modular as I was trying to get every bit of speed possible out of it.

Basically, not only do .beg/.end need to be set up, but so does .tptr[x] for each bucket x, i.e. you have to know already how many elements will be moved to each radix bucket by each thread. If you look carefully at tuple_thread where the tuple lists are being set up, that this counting is being down with the line of code 'kptr[c & BMASK] += 1;'. That doesn't fully answer the question of how I do this, but hopefully you'll now be able to piece together the appropriate setup.

BTW, may I ask why you are changing it? I have already done numerous experiments with minimizers, gapped seeds, homo-polymer compression, etc. You can make it faster, but always loose sensitivity unless you are very careful. I had one combination where I saw a 30% improvement and I may at some point place it into the code base.

-- Gene

On 4/16/18, 5:16 PM, JEzpeleta wrote:

Note: This is not necessarily a bug/issue, but more of a question at this point.

I have made some changes to how tuple lists are built in tuple_thread. These appear to work fine, and everything downstream of tuple_thread works fine when NTHREADS=1. However, when I use multi-threading (I have tried NTHREADS=2 or NTHREADS=4), the lex_sort after tuple building (mersort) fails. Specifically, I get a list that is /almost/ sorted but contains zeros and a few elements at apparently random positions which are not sorted.

The tuple list I am supplying is identical in either case (with NTHREADS=1 or NTHREADS=4), so my current guess is that the issue must be in how I set up the .beg and .end parameters for lex_sort. My current approach is to take the list length (which is NOT equal to block->reads[nreads].boff - Kmer * nreads in my case) and divide that in equal parts (so if NTHREADS=2 and I have a list with 4000000 tuples, I set parmx[0].beg=0, parmx[0].end=2000000, parmx[1].beg=2000000, parmx[1].end=4000000).

In any case, my specific question is whether there are any requirements about how parmx must be set for lex_sort to work correctly. Maybe the program expects the number of elements assigned to each thread to be a multiple of a given number, or something along those lines. Any help or thoughts are welcome.

I hope I have been clear. Thank you in advance for you help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/79, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkkNsLtMdXvpmg14aSNFHmW0ZVw593Pks5tpLW5gaJpZM4TWu4p.