Closed jimhester closed 11 years ago
Thank you very much for your interest and pointing out this issue.
This library (SSW) is not derived from Heng Li's Smith-Waterman (SW) implementation. They do have high similarity at the core part of the SW score matrix calculation. This is because both of them are implementations of Farrar’s algorithm (please see Fig. 5 of Farrar, M., 2007, Striped Smith-Waterman speeds database searches six times over other SIMD implementations, Bioinformatics). The pseudo-code is strictly written in this figure. Most implementations of Farrar’s algorithm, such as the SW implementation in Stampy (Lunter, G. and Goodson, M., 2011, Stampy: a statistical algorithm for sensitive and fast mapping of illumina sequence reads. Genome Res), use similar codes at their core part. This algorithm is difficult to be written in another way, different implementations just add their extra lines to record the score matrix information for trace back. Whatever they do, the codes for matrix calculation are almost exactly the same.
I do read Heng Li’s klib before. When I saw this following macro:
(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 8)); \
(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 4)); \
(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 2)); \
(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 1)); \
(ret) = _mm_extract_epi16((xx), 0) & 0x00ff; \
} while (0)
I thought this is a great piece of code that can lead to extra efficiency, so I used this macro to replace the corresponding part in my program. As a student, I also would like to learn from exports.
I appreciate the MIT license issue you pointed out, and apologize for my lacking of experience. I have added the license into the file ssw.c. Hope this can resolve your concerns. Please feel free to let me know if you find any other problem with this library. Thank you sincerely for your encouragement again.
Good work on the library, I think it can be a valuable addition which will make it easy to add fast smith waterman alignments to analysis pipelines. However, the lack of attribution to Heng Li in this libraries documentation and the accompanying paper greatly concerns me.
This library seems to be derived from @lh3 (Heng Li)'s smith waterman implementation in bwa and his stand alone ksw. See https://github.com/attractivechaos/klib and https://github.com/lh3/bwa. This is apparent both by casual code perusal, but also by the plagarism detection program moss, http://moss.stanford.edu/results/265225620
From the MIT license included in the ksw release it states
I read this to mean that you are free to do what you want with the software provided you give proper attribution to the source, which you seem to have neglected to do.
I would like to use this library in my work, and I think it can be a valuable tool for the bioinformatics community at large. However if this issue remains unresolved I will not be able to do so in good conscience.