picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2
GNU General Public License v3.0
319 stars 104 forks source link

place_seqs.py --ref_dir questions #276

Closed bernt-matthias closed 1 year ago

bernt-matthias commented 1 year ago

Is it correct that place_seqs.py for the fungal data in /picrust2/default_files/fungi/fungi_18S and picrust2/default_files/fungi/fungi_ITS only works with --placement_tool epa-ng?

Wondering since the raxml_info files are missing which are present in the prokaryotic reference data.

Guessing a bit around from the sources https://github.com/picrust/picrust2/blob/90ff3d912e0619b56f946e37b432f553550ef477/picrust2/place_seqs.py#L259

Also I'm not sure what these files really are: https://github.com/picrust/picrust2/wiki/Sequence-placement

gavinmdouglas commented 1 year ago

Hi @bernt-matthias,

Yes at the time those files were created we had only implemented EPA-ng for placement. Just so you know - we show in the supplementary materials of the PICRUSt2 manuscript that you can predictions that are better than random based on 18S/ITS data in fungi, but that it is very noisy, and so I would take the results with a major grain of salt.

If you mean the four custom reference files, these are the: (1) multiple sequence alignment of the reference sequences (so this is an alignment of 16S sequences for the main database) (2) A tree representing the inferred phylogenetic relationships of these sequences (3) A HMM made with hmmbuild based on the multiple sequence alignment (4) The output "model" file output by RaXmL specifying the parameters that were used to build the tree. This last one is the tricky file, which is required by EPA-ng. And yes if you wanted to use SEPP, this requires a similar file in a different format, due to being hard-coded for a different file format. My best suggestion is to look at the examples in the prokaryotic database, because these files tend to be plain text and mix of comments and data values, so it can be tricky to know how to format a custom one if you need to do so.

I hope that helps!

Gavin

bernt-matthias commented 1 year ago

Thanks for the clarification.

For your reference, I'm working on the Galaxy tool https://github.com/galaxyproject/tools-iuc/pull/3904