picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2
GNU General Public License v3.0
329 stars 104 forks source link

'When using epa-ng like this, a model has to be explicitly specified!' #23

Closed shafferm closed 6 years ago

shafferm commented 6 years ago

Hello,

I am trying to set up the picrust2 beta to play around with it. I have installed all dependencies and they seem to work independently of picrust2. When I run pytest I have 4 errors which all seem to be the same. epa-ng says it must be run with a model explicitly specified. I have pasted a part of the error below. Does this mean anything to you or does it seem to be an issue on my end?

Thanks, Mike

--------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------- papara called as: papara -t /Users/mish0397/git_sw/picrust2/tests/test_data/place_seqs/img_centroid_16S_aligned_head30.tre -s ref_seqs.phylip -q /Users/mish0397/git_sw/picrust2/tests/test_data/place_seqs/study_seqs_test.fasta -j 1 -n out gap rate: 1139 19290 gap rate: 0.0590461 rate matrix: 2,2 p: 2,2 references container instantiated as: papara::references<pvec_pgap, sequence_model::tag_dna> edges: 27 scoring scheme: -3 -1 2 -3 papara_core version 2.5 start scoring, using 1 threads thread 0: 2.25299 gncup/s scoring finished: 0.0308421 0.0701151 generating best scoring alignments SUCCESS 0.0701232 INFO Selected: Output dir: /tmp/tmp8_76gnoc/epa_out/ INFO Selected: Query file: /tmp/tmp8_76gnoc/study_seqs_papara.fasta INFO Selected: Tree file: /Users/mish0397/git_sw/picrust2/tests/test_data/place_seqs/img_centroid_16S_aligned_head30.tre INFO Selected: Reference MSA: /tmp/tmp8_76gnoc/ref_seqspapara.fasta INFO Selected: Specified model: GTR+G INFO Rate heterogeneity: GAMMA (4 cats, mean), alpha: 1 (ML), weights&rates: (0.25,0.136954) (0.25,0.476752) (0.25,1) (0.25,2.38629) Base frequencies (ML): 0.25 0.25 0.25 0.25 Substitution rates (ML): 0.5 0.5 0.5 0.5 0.5 1 INFO Selected: Reading queries in chunks of: 5000 INFO Selected: Using threads: 1 INFO __ __ ____ / __// \ / | / | / // __/ / / / // // /| | __ / |/ // /
/ /_ / __// _
|/// /| // // /
/____/// // || // |/ \
/ (v0.2.1-beta) When using epa-ng like this, a model has to be explicitly specified! You may specify it generically (GTR+G), however parameters will not be optimized. Instead we reccommend to use RAxML to re-evaluate the parameters and then pass the resulting RAxML_info file to the epa-ng --model argument. epa-ng will then auto-parse the parameters. ( raxmlHPC -f e -s /tmp/tmp8_76gnoc/ref_seqs_papara.fasta -t /Users/mish0397/git_sw/picrust2/tests/test_data/place_seqs/img_centroid_16S_aligned_head30.tre -n info -m GTRGAMMAX ) Aborting with a failure. --------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------- sweep: 84 -> 0 Error running this command: epa-ng --tree /Users/mish0397/git_sw/picrust2/tests/test_data/place_seqs/img_centroid_16S_aligned_head30.tre --ref-msa /tmp/tmp8_76gnoc/ref_seqs_papara.fasta --query /tmp/tmp8_76gnoc/study_seqs_papara.fasta --chunk-size 5000 -T 1 -w /tmp/tmp8_76gnoc/epa_out

gavinmdouglas commented 6 years ago

It looks like there was a recent update to EPA-NG which breaks the place_seqs.py script, thanks for posting this! For now, reverting to an earlier version should resolve this error. I wont have a chance to address this until next week.

Best,

Gavin

shafferm commented 6 years ago

Thanks Gavin! Do you know about how far back I should go? Also I have modified your dev-environment.yml so that everything is installed that is needed to compile all dependencies on linux. Would this be something you would be interested in a pull request for?

gavinmdouglas commented 6 years ago

You can use EPA-ng version 0.2.1-beta listed here: https://github.com/Pbdas/epa-ng/releases

That sounds really helpful - a pull request would be welcome!

Gavin

shafferm commented 6 years ago

Thanks Gavin. Actually had to go a few commits passed that to get around a compilation error but now I'm passing all tests. For anyone else potentially trying this the commit id is 7a48da3feb2ccec75f882e012401ecfe37f4c1b9.

vmaffei commented 6 years ago

Ran into this error as well. The error message gives a raxml command (example above: raxmlHPC -f e -s /tmp/tmp8_76gnoc/ref_seqs_papara.fasta -t /Users/mish0397/git_sw/picrust2/tests/test_data/place_seqs/img_centroid_16S_aligned_head30.tre -n info -m GTRGAMMAX), which when run produces a RAxML_info.info file containing model tuning parameters. Running epa-ng w/ --model RAxML_info.info fixes this. Just a heads up, Gavin, when you get a chance to take a look.

Also, older versions of raxml lack the GTRGAMMAX option, so make sure to use a recent release.

gavinmdouglas commented 6 years ago

I'm glad you found a commit that works @shafferm - please let me know if you run into other issues!

Thanks for posting the fix as well @vmaffei!

These issues are going to keep arising since EPA-NG and GAPPA are under rapid development at the moment. As a temporary fix I added the source code of versions of these tools that are compatible with PICRUSt2 to the main PICRUSt2 repo and changed the installation instructions. Keeping PICRUSt2 in line with the current EPA-NG best practices will be an ongoing task, but this solution should work for now until they make a stable release.

gavinmdouglas commented 6 years ago

The model used by EPA-NG is now explicitly set to be GTR+G in PR https://github.com/picrust/picrust2/pull/27 - I'm closing this for now although tweaks to this pipeline will continue to be made.