Closed Eve-Cheung closed 5 years ago
Hi @Eve-Cheung,
A number of people have asked me similar questions so I added it to the list of FAQ here, which is re-pasted below.
The categorize_by_function.py
script collapsed all gene families to any pathways that they could potentially be involved in. In other words, all pathways with at least one required gene family present were assumed to be present and gene families contributed equally to all pathways they are within. There are many ways to infer pathway levels from gene family abundances, but I personally dislike this approach since I believe it results in a large proportion of false positive pathways being reported. For this reason, a more stringent pathway prediction pipeline has been implemented in PICRUSt2 (identical to the approach used in HUMAnN2). In addition, categorize_by_function.py
script will not work with PICRUSt2 outputs directly. See How can I determine KEGG pathway abundances from the predicted KO abundances? for a different method for inferring KEGG pathways in PICRUSt2.
However, if you are determined to get this output and are comfortable working in R you can get the identical output to what categorize_by_function.py
would output by using an R function shared below. Please note that this code is not part of the official PICRUSt2 project and is simply shared here in the hopes that it will be helpful. You can download the R code here: https://www.dropbox.com/s/x3b4996hnxz1dpc/picrust1_categorize_by_func.R?dl=1 and the legacy table of mappings from KOs to BRITE hierarchy here (which is a required input file): https://www.dropbox.com/s/1qei0k5z0lpy73m/picrust1_KO_BRITE_map.tsv?dl=1.
Note that if you use this function you could run a sanity check with a file you run through PICRUSt1 to make sure you are getting the expected output (an example of how to do this is at the bottom of the Rscript).
Thank u. It's so patient of you. I have tried the script "pathway_pipeline.py" which solved my question well. I get the KEGG path_abundance table with only ko ID. However, if I want to make the ID match their pathway information accordingly, I should search the ko ID in the file KEGG_pathway_info.tsv. Is there any shortcut can make it?
Ok great. Yes you can use the add_descriptions.py
script and set the --custom_map_table
option to point to the KEGG pathway info file.
Nice! I make it ! thanks your help!!!
How do I get this table:KEGG_pathway_info.tsv?
Hi @hjdong ,
If you download the repository and move to picrust2/default_files/pathway_mapfiles/
you will see the mapfile.
Hi @hjdong ,
If you download the repository and move to
picrust2/default_files/pathway_mapfiles/
you will see the mapfile.
Thanks,
I have another question here: Should I standardize the data before run command pathway_pipeline.py
?
There usually isn't any transformation done between the metagenome and pathway prediction steps, but you certainly can if that's required for your analysis.
Best,
Gavin
Hi~ How can I transform the file 'pred_metagenome_unstrat.tsv' (eg. ko00562 ) to a table with functional annotation by levels (eg. Inositol phosphate metabolism)? I have tried the pipeline 'categorize_by_function.py' in PICRUSt1 but it seems not suit for the output file in PICRUSt2. There were always error: TypeError: izip argument #2 must support iteratio. Thanks a lot