sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Re-implement the code that read input files in parallel. #230

Open sebhtml opened 10 years ago

sebhtml commented 10 years ago

The code that counts sequences in file (Partitioner) is fine.

But after that, the code that reads sequences from file is not very good.

The problem is that too many processes are reading the same file at once.

The code can't really use MPI I/O for that directly because (I think) because MPI I/O functions are collectives.

One thing that would great would be:

Have just 1 process that takes care of one file and dispatch the sequences to other ranks / actors.

code/SequencesLoader/SequencesLoader.cpp

metadata has to be sent too (LEFT_READ, RIGHT_READ, PAIR MATE and so on).

This is not trivial because code/SequencesLoader/SequencesLoader.cpp is quite ugly.