marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS
http://marbl.github.io/metAMOS
Other
93 stars 45 forks source link

run_pipeline_test.sh #195

Closed FrankLee-1987 closed 9 years ago

FrankLee-1987 commented 9 years ago

I need help in understanding the following

1) what does this 500:3500 in this command refer to ? "../initPipeline -f -m carsonella_pe_filt.fna -d test1 -i 500:3500" 2) I want to customize the software my dataset. Assemble (MetaVelvet,SOAPdenovo2 ), FindORFS(MetaGeneMark), Validate (QUAST), Annotate(FCP). I hope this is the runpipeline parameters I need to provide for above condition runPipeline -a MetaVelvet,soap -c FCP -g MetaGeneMark -X quast -p 15 -d test1 -k 55 -f Assemble,MapReads,FindORFS,Annotate,FunctionalAnnotation,Propagate,Classify,Abundance,FindScaffoldORFS -n FunctionalAnnotation 3) Then what's the significance of workflows like core, iMetAMOS, optional, deprecated. When to use these workflows? 4) So what i understood is run_pipeline_test.sh is not invoking any workflow. It is a customized analysis.

skoren commented 9 years ago
  1. The size is the minimum and maximum insert size of the library.
  2. Yes, the parameters are correct but QUAST requires a reference to validate so metAMOS does not use it to select a winner (since depending on how distant a reference is available it may not be a good metric for assembly quality). If it can recruit a suitable reference it will be used for validation and included in reports.
  3. The workflows are described in the documentation: http://metamos.readthedocs.org/en/v1.5rc3/content/workflows.html Workflows consist of two parts, first the programs they depend on will be installed by INSTALL.py when you set up metAMOS. So including iMetAMOS will add all the assemblers and validators to the list of tools to install. Second, they define parameters and optionally data to run. This is to simplify the runPipeline command if you don't want to always specify multiple assemblers/validation tools/etc. There is an example custom workflow included in the Test folder (test_ima.ini) which records data used for the analysis as well to ease reproducibility. Workflows are optional and don't have to be used but are meant to encapsulate common analysis tasks that are run repeatedly.
  4. Yes, every run of runPipeline starts as a core workflow and then parameters are modified. These can be modified on the command line (as in run_test_pipeline) or using a workflow (as in test_ima).
FrankLee-1987 commented 9 years ago

Thanks Sergey for your clarification..Can I use the same library size for my data?

skoren commented 9 years ago

It's pretty tolerant of incorrect insert sizes because most assemblers and metAMOS will re-estimate the insert size. However, I would recommend giving as close an estimate as you know for your data. Most Illumina inserts are 500-800bp so I'd recommend 200:800 as the range if you have Illumina data.

FrankLee-1987 commented 9 years ago

Thanks. Do we need to execute just test run_Pipeline_test.sh or all the *test.sh files?

Because for some of the *test.sh files. I am getting errors.

skoren commented 9 years ago

The run_pieline_test.sh tests the core functionality. Some of the other tests require optional components and will give errors if they are not installed. Depending on what you installed when you ran python INSTALL.py (i.e. if you included iMetAMOS) you can also test run_sra.sh and run_ima.sh as tests of you rinstallation.