kosta777 / parallel-genomeseq

Parallelization of popular genome sequencing algorithms
4 stars 1 forks source link

Coarse mpi #15

Closed kosta777 closed 4 years ago

kosta777 commented 4 years ago

Please wrap it with {} block otherwise la won’t be released.

Yeah i missed a curly block there, but won't it still be released immediately(in the same iteration) due to it being in a while loop block?

huanglangwen commented 4 years ago

Please wrap it with {} block otherwise la won’t be released.

Yeah i missed a curly block there, but won't it still be released immediately(in the same iteration) due to it being in a while loop block?

I think so, but in the previous memory leakage case, it didn’t work. So it might be better to explicitly wrap it with a block.

kosta777 commented 4 years ago

Please wrap it with {} block otherwise la won’t be released.

Yeah i missed a curly block there, but won't it still be released immediately(in the same iteration) due to it being in a while loop block?

I think so, but in the previous memory leakage case, it didn’t work. So it might be better to explicitly wrap it with a block.

Hmm thats weird. Yeah I will wrap it just to be sure.

huanglangwen commented 4 years ago

I think we need a C++ function to read fq file directly, just simply parse csv file by find positions of the right comma and then use substr. The only elements we need from fq is the sample sequence and ground-truth position. Then we could run SWAligner and compare output position with reference position on the fly. This would make the program a lot cleaner.

kosta777 commented 4 years ago

I think we need a C++ function to read fq file directly, just simply parse csv file by find positions of the right comma and then use substr. The only elements we need from fq is the sample sequence and ground-truth position. Then we could run SWAligner and compare output position with reference position on the fly. This would make the program a lot cleaner.

What you are saying cannot be done in parallel via MPI, since MPI reads can only do binary reads so they need to know the amount of bytes they have to read. I am not saying that we are getting any performance benefit by reading in parallel, but I think it is an interesting to compare this and research when we have more time. Additionally, I think I would rather we process all inputs before working on them so that we can strip them of all the things we do not need (such as ID's , indices or their "ground truth positions") than have those c++ read function calls everywhere where we use reads.

kosta777 commented 4 years ago

The code would also be much cleaner if we knew that everywhere we need our reads that they are stored in all input files in a same way - only the info we need, which is a single 125char read. This is also something related to the "input files" issue I opened recently, so it might be more beneficial to discuss it there instead on this pull request which does not have much to do with it.