Questions 2: Why didn't single-end data use JNI to speed up?

ytchen0323 / cloud-scale-bwamem

Apache License 2.0

15 stars 9 forks source link

Questions 2: Why didn't single-end data use JNI to speed up? #16

Open xubo245 opened 7 years ago

xubo245 commented 7 years ago

Questions 2: Why didn't single-end data use JNI to speed up? (JNI: bwamem use SIMD)

ytchen0323 commented 7 years ago

It is simply because we haven't integrated it yet for single-end data... Most of the data we have are pair-end data. So I only implement pair-end first... It is not difficult to do single-end in the same way but it just needs time for integration and validation...

For pair-end, I think it works fine. We spent significant amount of time on validating pair-end data.

I can check if my previous lab members to see if they have bandwidth to work on that. I graduated last summer so I do not continue on the efforts to improve cs-bwamem. I will see if I have some time to work on this.

xubo245 commented 7 years ago

I also try paired end , 10000000 reads for each FASTQ in paired-end data result: CS-BWAMEM Pair-end: 20000000=》20256757 : 1.28% bwamem Pair-end(use bwa software)： 2000000=》2000323 : 0.0162 %

CS-BWAMEM produces more alignment results than native bwamem in pair-end data.

Thank you for tell me the reason. If necessary， I will try to do single-end.

Have you documents of CS-BWAMEM? I have read your doctoral thesis， cs-bwamem poster(both paper and A1 poster), which are not enough details. These days I also have read you most scala code in cs-bwamem, but I have many questions and a lot of place that I don't understand, can you send related document about design if you have it?Please. Especially the calculation of map index and extension.

Thank you very much.

xubo245 commented 7 years ago

I want to improve extension performance of sequence alignment with more efficient SIMD technology, I have improve sequence alignment and achieve higher performance than striped SW algorithm with SIMD , which bwamem also used. Can you give me some advise or suggestions about how to replace sequence alignment in bwamem (cs-bwamme jni invoke)? Please.

the striped SW is (in you doctoral thesis) : [68] Michael Farrar. Striped Smith{Waterman speeds database searches six times over other simd implementations. Bioinformatics, 23(2):156{161, January 2007.

Thank you very very very much.

ytchen0323 commented 7 years ago

I guess the doctoral thesis is the most detailed doc I have for now. But it may have the up-to-date information. We have a manuscript and we are planning for submission but it also only cover high-level stuff with validation results. It would be better to discuss it offline for more detailed questions. You can send me an email (ytchen@cs.ucla.edu).

xubo245 commented 7 years ago

Thanks. I will consult you by sending an email if we can not solve the problem in github.

Thank you again.