picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2
GNU General Public License v3.0
321 stars 104 forks source link

Problem with the picrust2 key output file--KO-metagenome table #77

Closed Eve-Cheung closed 5 years ago

Eve-Cheung commented 5 years ago

Hi~ How can I transform the file 'pred_metagenome_unstrat.tsv' (eg. ko00562 ) to a table with functional annotation by levels (eg. Inositol phosphate metabolism)? I have tried the pipeline 'categorize_by_function.py' in PICRUSt1 but it seems not suit for the output file in PICRUSt2. There were always error: TypeError: izip argument #2 must support iteratio. Thanks a lot

gavinmdouglas commented 5 years ago

Hi @Eve-Cheung,

A number of people have asked me similar questions so I added it to the list of FAQ here, which is re-pasted below.

The categorize_by_function.py script collapsed all gene families to any pathways that they could potentially be involved in. In other words, all pathways with at least one required gene family present were assumed to be present and gene families contributed equally to all pathways they are within. There are many ways to infer pathway levels from gene family abundances, but I personally dislike this approach since I believe it results in a large proportion of false positive pathways being reported. For this reason, a more stringent pathway prediction pipeline has been implemented in PICRUSt2 (identical to the approach used in HUMAnN2). In addition, categorize_by_function.py script will not work with PICRUSt2 outputs directly. See How can I determine KEGG pathway abundances from the predicted KO abundances? for a different method for inferring KEGG pathways in PICRUSt2.

However, if you are determined to get this output and are comfortable working in R you can get the identical output to what categorize_by_function.py would output by using an R function shared below. Please note that this code is not part of the official PICRUSt2 project and is simply shared here in the hopes that it will be helpful. You can download the R code here: https://www.dropbox.com/s/x3b4996hnxz1dpc/picrust1_categorize_by_func.R?dl=1 and the legacy table of mappings from KOs to BRITE hierarchy here (which is a required input file): https://www.dropbox.com/s/1qei0k5z0lpy73m/picrust1_KO_BRITE_map.tsv?dl=1.

Note that if you use this function you could run a sanity check with a file you run through PICRUSt1 to make sure you are getting the expected output (an example of how to do this is at the bottom of the Rscript).

Eve-Cheung commented 5 years ago

Thank u. It's so patient of you. I have tried the script "pathway_pipeline.py" which solved my question well. I get the KEGG path_abundance table with only ko ID. However, if I want to make the ID match their pathway information accordingly, I should search the ko ID in the file KEGG_pathway_info.tsv. Is there any shortcut can make it? path_abundance

pathway_info

gavinmdouglas commented 5 years ago

Ok great. Yes you can use the add_descriptions.py script and set the --custom_map_table option to point to the KEGG pathway info file.

Eve-Cheung commented 5 years ago

Nice! I make it ! thanks your help!!!

wfgui commented 4 years ago

How do I get this table:KEGG_pathway_info.tsv?

gavinmdouglas commented 4 years ago

Hi @hjdong ,

If you download the repository and move to picrust2/default_files/pathway_mapfiles/ you will see the mapfile.

wfgui commented 4 years ago

Hi @hjdong ,

If you download the repository and move to picrust2/default_files/pathway_mapfiles/ you will see the mapfile.

Thanks, I have another question here: Should I standardize the data before run command pathway_pipeline.py?

gavinmdouglas commented 4 years ago

There usually isn't any transformation done between the metagenome and pathway prediction steps, but you certainly can if that's required for your analysis.

Best,

Gavin