mengyao / Complete-Striped-Smith-Waterman-Library

294 stars 112 forks source link

Core dumps in multicore environment #66

Closed andreirajkovic closed 2 years ago

andreirajkovic commented 4 years ago

Firstly, thank you all for uploading this critical implementation of the SSW! I am attempting to incorporate your C/C++ algorithm in a symmetric multiprocessing environment, however I seem to be running into some issues, possibly with memory.

My program uses std::async to instruct multiple threads to access separate fastq files (or different parts of the same fastq) and when the fastq is small < 100 reads, the program executes fine. However, in cases when the fastq is larger, the program results in core dumps.

I've managed to isolate the problem to pointers that point to s_align and s_profile. Is it possible the memory not being freed fast enough?

I look forward to hearing your response.

jeffdaily commented 4 years ago

The problem is most certainly not in SSW, but in how you're using it.

Make sure that for each s_profile you create that it is done by a single thread (meaning, do not call ssw_init by multiple threads concurrently and assign them to the same s_profile pointer). Make sure you are done using a s_profile by all threads prior to calling init_destroy() on it. Same goes for the s_align result. If you are calling ssw_align() by multiple threads concurrently, make sure they are assigning the s_align result to a unique pointer. And don't call align_destroy() until all threads are done using the alignment result.

I suspect you haven't written many "SMP" programs before and are now starting to learn some of the pitfalls associated with multi-threaded applications. Without a link to your code that we might review, we can only guess what the problem might be.

andreirajkovic commented 4 years ago

Thanks for the prompt response. Here is a link to the code https://github.com/andreirajkovic/hashtable_multithreaded/blob/master/multithreaded_ssw.cpp. I apologize it is a bit rough. I believe the way async is calling the function, my s_profile and s_align should be generating unique pointers. I found that setting s_align and s_profile to NULL works for now, though I am not sure how effective of a fix this is.

jeffdaily commented 4 years ago

Once thing I notice. Only call align_destroy() and init_destroy(). You should not call free() on s_align or s_profile or their members. The XXX_destroy() functions perform the deallocation for you. So do not free the cigar yourself, etc.

jeffdaily commented 4 years ago

Otherwise, creating the s_profile and the s_align within the body of a std::async() function is good practice. You are creating the profile and the alignment local to the thread, and freeing them local to the thread. There is no data shared with other threads.

If you correct the extra free()'s, does this resolve your issue? Otherwise, I would continue to look at how you are using SSW rather than suspecting SSW itself causing the seg fault.

andreirajkovic commented 4 years ago

Calling align_destroy() and init_destroy() still resulted in seg faults. So far, setting profile and results to NULL after they've been called is the only thing that remedies this.

jeffdaily commented 4 years ago

Then I would suggest running your code using a debugger such as gdb. It will pause on the seg fault, at which time you can produce a backtrace showing exactly where the seg fault occurred. If you can provide the seg fault it should show whether the error is indeed caused by SSW or whether it is caused by the caller passing in bad parameters.

andreirajkovic commented 4 years ago

I will do that then! Thanks Jeff!

andreirajkovic commented 4 years ago

Are the differing read and ref lengths causing the issue? I think my memory allocation is tied to the read length as they should both me the same size.

banded_sw (n=, mat=, band_width=4194304, weight_gapE=, weight_gapO=3, score=, readLen=144, refLen=143, read=XXXXXXX "", ref=XXXXXXX "") at ssw.c:646 646 direction_line[de] = temp1 > temp2 ? 3 : 2;

jeffdaily commented 4 years ago

Were you able to produce a traceback for debugging? If you are getting a seg fault during banded_sw, I am not familiar with that portion of the code so I won't be able to help you further.