hallamlab / metapathways2

MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds
http://hallam.microbiology.ubc.ca/MetaPathways/
33 stars 14 forks source link

Problem with Grid setup #74

Closed rpelicae closed 7 years ago

rpelicae commented 9 years ago

Until now we have been able to run MetaPathways locally with a limited amount of input sequences on the provided databases and RefSeq. However, we would like to do the FUNC_SEARCH stage on our compute cluster that uses Sun Grid Engine as scheduler. According to the manual and the MetaPathways publication, it should be possible for the pipeline to externalize the (B)LAST job to this compute cluster and collect and consolidate the results back to my local machine. However, until now I have not been able to do this.

I have been digging into the code and noticed that there seems to be two ways to perform a (B)LAST search on a grid architecture: I set up in the template_param.txt => metapaths_steps: FUNC_SEARCH grid From MetaPathways.py:

  1. run_metapathways(samplesData, …, runid) which is imported from metapathways.py Within this function, the function execute_tasks (s, …) on the sample object is called, which is imported from metapathways_pipeline.py In this module, the status of the context (c.status) stored in the ContextBlock is checked and if this is ‘grid’, the function blastgrid from BlastGrid.py is used with argument c.commands[0] Unfortunately, the function blastgrid(argv) calls for a dictionary type argument (‘for key, value in argv.items(): ‘) which is not the case for c.commands[0]. So it does nothing and goes to the next block of stages.
2. After run_metapathways (which seems to be able to run the whole pipeline). There is a line ‘if blasting_system == grid’: which calls a blast_in_grid() function. 
   This function is imported from the blast_using_grid module which subsequently refers to a BlastBroker object.

Both ways seem to want to do the same thing: connect to a server, upload the databases and executables, perform the (B)LAST search and send the results back to the local machine. However the second option seems to perform this outside of the main pipeline which looks strange to me as I don’t see where the results would be parsed then.

Could you explain please a bit more on how to set up MetaPathways correctly to externalize the (B)LAST search? This would mean which PATH to set for the executables and databases for external (B)LAST and how to set up properly the grid_engine (grid_engine0) and other parameters (a. o. BLAST_REFDB, found in specialAlternatives in the Tools class?)

hallamlab commented 7 years ago

we are releasing a new version which should address this issue shortly