picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2
GNU General Public License v3.0
329 stars 104 forks source link

Missing metacyc nitrogen pathways? #296

Closed kaw97 closed 1 year ago

kaw97 commented 1 year ago

I inherited amplicon sequencing data on wild vs managed cranberry bogs and am using picrust2’s predictions to make hypotheses for a future shotgun sequencing experiment.

I was surprised to not see any predicted pathways directly related to nitrogen cycling in my dataset (although there were others connected to nitrogen cycling, like the urea cycle and certain amino acid pathways). The nitrogen cycle is one of the things we are most interested in because of the heavy ammonium fertilization of cultivated cranberry bogs.

On further digging, pred_metagenome_unstrat.descrip.tsv includes EC 1.7.1.15 (nitrite reductase NADH), which maps to RXN-13854 in ec_level4_to_metacyc_rxn.tsv. However, when I search for 13854 in metacyc_path2rxn_struc_filt_pro.txt it does not appear in the file.

Similarly, EC 1.7.99.4 was predicted in my dataset (nitrate reductase), which maps to Rxn-6369, Rxn-16471, and Nitratreduct-Rxn. None of these appear in metacyc_path2rxn_struc_filt_pro.txt.

Additionally, when I look at metacyc_pathways_info.txt I see many (>10) related to nitrogen assimilation/dissimilation. However, many of them, like PWY-5674 (nitrate reduction IV, dissimilatory), do not appear in metacyc_path2rxn_struc_filt_pro.txt.

Am I correct in interpreting this as certain pathways not being covered by the picrust pipeline, or possibly certain enzymes not being linked to all metacyc versions of the reaction? EC 1.7.1.15 on metacyc includes Neurospora crassa NC-nit-6, which is also annotated as 1.7.1.4. 1.7.1.4 is included in PWY-5675. If so, is there a straightforward way of systematically adding the missing connections?

gavinmdouglas commented 1 year ago

Hi there,

Yes, it's possible that those pathways were filtered out, either because they were dependent on too few reactions, or due to being added to a later version of the MetaCyc database then that file was based on.

If you know all the reactions involved in the pathways you are interested in (e.g., based on the MetaCyc website), then you could add them as additional lines to that file (although capturing all the redundancy, key vs optional reaction information would be confusing). I have to admit that I took this file from HUMAnN2, so I'm not sure exactly what workflow they used to generate it originally. However, at the very least, you could add in all reactions involved, simply with equal weighting (i.e., separated by spaces).

Cheers,

Gavin

kaw97 commented 1 year ago

Hi Gavin,

Sorry for pestering about this, since I recently came back to it. Where all do I need to add the pathways? I added my pathways of interesting to metacyc_path2rxn_struc_filt_pro.txt and metacyc_pathways_structured_filtered. There were also some instances where the EC coded enzyme wasn't in ec_level4_to_metacyc_rxn.tsv so I added those as entries there. I did not add any reactions to metacyc_rxn_to_level4ec.tsv.

When I tried to run the pathway pipeline, I got the error below. Are there more places I need to make changes, or did I do something wrong with the formatting?

Some of the pathways on metacyc appeared to have multiple codes for certain reactions. It looked like the way this is encoded in the pathway2rxn file was (RXN-1 , RXN-2 , ...). Here is a link to my edited files:

https://drive.google.com/drive/folders/1LPjh0kvj2-sZi6ntp76AMCYPYAHSl6Pi?usp=sharing

(picrust2) ➜ picrust pathway_pipeline.py -i EC_metagenome_more_pathways_out/pred_metagenome_contrib.tsv.gz -o more_pathways_out -p 1 Traceback (most recent call last): File "/home/kyle/miniconda3/envs/picrust2/bin/pathway_pipeline.py", line 7, in exec(compile(f.read(), file, 'exec')) File "/home/kyle/picrust2-2.5.1/scripts/pathway_pipeline.py", line 272, in main() File "/home/kyle/picrust2-2.5.1/scripts/pathway_pipeline.py", line 196, in main unstrat_abun_per_seq = pathway_pipeline( File "/home/kyle/picrust2-2.5.1/picrust2/pathway_pipeline.py", line 351, in pathway_pipeline in_metagenome = regroup_func_ids(in_metagenome, in_format, File "/home/kyle/picrust2-2.5.1/picrust2/pathway_pipeline.py", line 1173, in regroup_func_ids func_map[line_split[0]] += line_split[1].split(",") IndexError: list index out of range

thanks Kyle