Open xubo245 opened 7 years ago
It is simply because we haven't integrated it yet for single-end data... Most of the data we have are pair-end data. So I only implement pair-end first... It is not difficult to do single-end in the same way but it just needs time for integration and validation...
For pair-end, I think it works fine. We spent significant amount of time on validating pair-end data.
I can check if my previous lab members to see if they have bandwidth to work on that. I graduated last summer so I do not continue on the efforts to improve cs-bwamem. I will see if I have some time to work on this.
I also try paired end , 10000000 reads for each FASTQ in paired-end data result: CS-BWAMEM Pair-end: 20000000=》20256757 : 1.28% bwamem Pair-end(use bwa software): 2000000=》2000323 : 0.0162 %
CS-BWAMEM produces more alignment results than native bwamem in pair-end data.
Thank you for tell me the reason. If necessary, I will try to do single-end.
Have you documents of CS-BWAMEM? I have read your doctoral thesis, cs-bwamem poster(both paper and A1 poster), which are not enough details. These days I also have read you most scala code in cs-bwamem, but I have many questions and a lot of place that I don't understand, can you send related document about design if you have it?Please. Especially the calculation of map index and extension.
Thank you very much.
I want to improve extension performance of sequence alignment with more efficient SIMD technology, I have improve sequence alignment and achieve higher performance than striped SW algorithm with SIMD , which bwamem also used. Can you give me some advise or suggestions about how to replace sequence alignment in bwamem (cs-bwamme jni invoke)? Please.
the striped SW is (in you doctoral thesis) : [68] Michael Farrar. Striped Smith{Waterman speeds database searches six times over other simd implementations. Bioinformatics, 23(2):156{161, January 2007.
Thank you very very very much.
I guess the doctoral thesis is the most detailed doc I have for now. But it may have the up-to-date information. We have a manuscript and we are planning for submission but it also only cover high-level stuff with validation results. It would be better to discuss it offline for more detailed questions. You can send me an email (ytchen@cs.ucla.edu).
Thanks. I will consult you by sending an email if we can not solve the problem in github.
Thank you again.
Questions 2: Why didn't single-end data use JNI to speed up? (JNI: bwamem use SIMD)