hallamlab / metapathways2

MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds
http://hallam.microbiology.ubc.ca/MetaPathways/
33 stars 14 forks source link

How to analyse a large dataset remotelly #88

Closed ademenez closed 7 years ago

ademenez commented 8 years ago

Dear Metapathways developers,

I was wondering whether you could give me some advice on how to conduct metagenomics analysis using Metapathways. I have a very large dataset of 183 samples acquired using the Illumina HiSeq sequencer, from soil. These sequences have already been quality checked using the EBI pipeline.

My hope was to annotate the sequences both in terms of function and taxonomy using metapathways. Currently my set up is a windows computer connected to a remote server, however since I can only assess the remote server using a VPN connection, the connection is very slow which is making it difficult to execute operations remotely, particularly using the GUI.

My problem is that since the dataset is fairly large, I think I need to run the Blast analyses remotely (using the command line) and then import the outputs into a locally run metapathways to perform further analyses using the GUI. Would say that Metapathways would be an appropriate tool to analyse this large dataset, and would it be manageable to analyse a dataset like this using the GUI?

Any advice would be much appreciated.

Alexandre

ariahahn commented 8 years ago

Hi Alexandre,

If your data is assembled then yes, if its not that I would recommend using SOFA (https://github.com/hallamlab/SOFA). I would also recommend using LAST+ (https://github.com/hallamlab/LAST-Plus) rather than blast as LAST is much faster.

Hope that helps,

Aria

On Mon, Mar 21, 2016 at 9:38 AM, ademenez notifications@github.com wrote:

Dear Metapathways developers,

I was wondering whether you could give me some advice on how to conduct metagenomics analysis using Metapathways. I have a very large dataset of 183 samples acquired using the Illumina HiSeq sequencer, from soil. These sequences have already been quality checked using the EBI pipeline.

My hope was to annotate the sequences both in terms of function and taxonomy using metapathways. Currently my set up is a windows computer connected to a remote server, however since I can only assess the remote server using a VPN connection, the connection is very slow which is making it difficult to execute operations remotely, particularly using the GUI.

My problem is that since the dataset is fairly large, I think I need to run the Blast analyses remotely (using the command line) and then import the outputs into a locally run metapathways to perform further analyses using the GUI. Would say that Metapathways would be an appropriate tool to analyse this large dataset, and would it be manageable to analyse a dataset like this using the GUI?

Any advice would be much appreciated.

Alexandre

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/hallamlab/metapathways2/issues/88

Aria Hahn MSc PhD Candidate www.cupcakesandscience.com Hallam Laboratory http://www.cmde.science.ubc.ca/hallam/index.php

University of British Columbia Department of Microbiology & Immunology Life Sciences Centre 2350 Health Sciences Mall (Rm 2520) Vancouver, BC Canada V6T 1Z3 T: 604-827-4216 F: 604-822-6041 ariahahn@interchange.ubc.ca

ademenez commented 8 years ago

Hi Aria,

Thank you very much for the reply, I have another quick question: I have fasta files already, which I obtained through the EBI analysis pipeline. Would SOFA work with those, or does it require fastq files? i.e. can I skip some stages of the SOFA processing pipeline?

Best wishes,

Alex

ariahahn commented 8 years ago

Yes, but it could be tricky. You'd have set up your file structure as it expects it to be and then just ask (see the wiki) for the steps you want). However, I would still recommend merging paired-end reads when possible. It will make your ORF prediction a lot better (ie you'll get higher quality ORFs and more and better annotation).

Aria

On Mon, Mar 21, 2016 at 11:50 AM, ademenez notifications@github.com wrote:

Hi Aria,

Thank you very much for the reply, I have another quick question: I have fasta files already, which I obtained through the EBI analysis pipeline. Would SOFA work with those, or does it require fastq files? i.e. can I skip some stages of the SOFA processing pipeline?

Best wishes,

Alex

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/hallamlab/metapathways2/issues/88#issuecomment-199422494

Aria Hahn MSc PhD Candidate www.cupcakesandscience.com Hallam Laboratory http://www.cmde.science.ubc.ca/hallam/index.php

University of British Columbia Department of Microbiology & Immunology Life Sciences Centre 2350 Health Sciences Mall (Rm 2520) Vancouver, BC Canada V6T 1Z3 T: 604-827-4216 F: 604-822-6041 ariahahn@interchange.ubc.ca

taltman commented 8 years ago

Hi Alex,

I highly recommend that you use MetaPathways in batch mode on the remote machine, and run the job from within a 'screen' session, so that even if your VPN cuts out, your batch MP run will keep on chugging.

Once it is done computing, you can copy the result to a local machine (i.e., your laptop) for local interaction with the MP GUI.

Let me know if you have any questions.

Cheers,

~Tomer

P.S.- I'm not a MP dev, but I'm a big fan!