rachelss / SISRS

Site Identification from Short Read Sequences.
24 stars 15 forks source link

Using Bowtie across multiple nodes, if available #51

Open BobLiterman opened 6 years ago

BobLiterman commented 6 years ago

Hey @anderspitman,

One major bottleneck with SISRS is that the Bowtie mapping steps occur serially, rather than in parallel. In part, this is because Bowtie itself does not run in MPI mode and therefore cannot fully utilizes multi-node systems.

However, it may be possible to use Python packages (e.g. https://pypi.python.org/pypi/dispy) to distribute parallel jobs across nodes. If the user, for instance, sets a flag for number of processors (already exists) and also number of nodes (putative new flag), if we could figure out a way to adapt the individual Bowtie calls to submit to 'N' jobs to 'N' nodes based on the user flag, we could greatly speed up the mapping process.

This is a major benefit of the Python port, so we wanted to bring it up now as things are getting worked out.

Best, Bob

anderspitman commented 6 years ago

@BobLiterman on cursory inspection, I don't see any reason why we couldn't do this. Where exactly is the mapping code you're referring to in SISRS? You're not talking about mapContigs, right?

BobLiterman commented 6 years ago

The bowtie commands in alignContigs and identifyFixedSites

anderspitman commented 6 years ago

Came across scoop today. Might be a good alternative. Posting here to remember to check it out later.

BobLiterman commented 6 years ago

Looks promising.