Closed rozlynboutin closed 5 years ago
Hey @rozlynboutin,
Yes you can get predictions for ITS sequences now with the strong caveat that the predictions are much lower accuracy than for prokaryotes. You should check out the pre-print which describes the fungi validations: https://www.biorxiv.org/content/10.1101/672295v1. To me the fungal prediction results are a nice proof of concept that metagenome inference for eukaryotes is possible and that it will likely improve as more genomes become available. However, it's still unclear how useful these predictions are currently and how much useful signal there is above the noise... One reason that this is unclear that it's difficult to compare the predictions with a gold-standard of expected functions.
With that major caveat in mind though you can get the predictions with the standalone version of PICRUSt2 (see here: https://github.com/picrust/picrust2/wiki/Workflow), but not the QIIME2 plugin. You will need to point the appropriate options to the fungal ITS databases which are in default_files/fungi
(of the github repository) for the sequence placement and hidden-state prediction steps.
Thanks for the quick reply and for the info! I anticipated that the predictions might be less accurate than the prokaryotic ones, but I'll go check out the paper and try my luck using the standalone version.
Hi again!
I've gotten the pipeline to run (hooray!), but now I am trying to visualize the results in STAMP. When I try to load in the path_abun_unstrat_descrip.tsv as the Profile File, I am getting the following error:
Data does not form a strict hierarchy. Child not_found has multiple parents (e.g., P185-PWY, PWY-2723).
Is there an easy fix for this? Sorry, I'm not very familiar with STAMP. Is there an R package I can use to do the analysis as well?
Thanks in advance!
Hey @rozlynboutin,
Great to hear! That is a STAMP error occurring because it thinks the description and function columns are forming a hierarchy. It's throwing an error because the "not_found" description has multiple pathway parents (which it doesn't allow). You could just use the version of the table without the description column to avoid this issue.
However, no pathways should be missing descriptions so I think this is a bug specifically for the fungi pathways and I'll make sure those pathway descriptions are included in the next release.
Okay, thanks! I'll give it a try without the descriptions. In the meantime, is there a way that I can annotate the description myself based on the pathway ID?
Actually you should be able to point to this description mapfile with add_descriptions.py
: picrust2/default_files/description_mapfiles/metacyc_pathways_info_fungi.txt.gz
The default MetaCyc description mapfile is just for prokaryotes so hopefully that resolves this problem.
I used the code add_descriptions.py -i pathways_out/path_abun_unstrat.tsv.gz -m METACYC \ -o pathways_out/path_abun_unstrat_descrip.tsv.gz \ --custom_map_table picrust2-master/picrust2/default_files/description_mapfiles/metacyc_pathways_info_fungi.txt.gz
But some of the descriptions are still not found.
Ah I see - I'll have to look into that. In the meantime you can look up specific pathway ids on the MetaCyc website: https://metacyc.org/
Hey @rozlynboutin,
I think the issue was that you specified the -m
option and the custom_map_table
options. It used the default MetaCyc descriptions (which are for prokaryotic pathways), which is why there are a lot of unknown pathways. This is definitely confusing so I'll alter the script so it can't happen in the future.
I see! Thanks so much for following up. So I should not specify the -m option if I specify the custom_map_table then?
Yes that's right - sorry for the confusion.
No problem, this makes sense!
Hi @gavinmdouglas . I'm trying to do the same as @rozlynboutin, already have dna-sequences.fasta from q2-dada2 plugins, downloaded the fungal ITS databases which are in default_files/fungi (of the github repository) to home/picrust2-run/fungi_ITS, and running this command in picrust2-v2.4.1 environment:
place_seqs.py -s dna-sequences.fasta -o placed_seqs.tre -p 4 --intermediate placement_working -t epa-ng --ref_dir /home/picrust2-run/fungi_ITS --verbose
I got these error mesages:
Error running this command: hmmalign --trim --dna --mapali /home/picrust2-run/fungi_ITS/fungi_ITS.fna.gz --informat FASTA -o placement_working/query_align.stockholm /home/picrust2-run/fungi_ITS/fungi_ITS.hmm dna-sequences.fasta
Standard error of the above failed command:
Error: File format problem in trying to open HMM file /home/picrust2-run/fungi_ITS/fungi_ITS.hmm. Format tag is '<!DOCTYPE': unrecognized. Current H3 format is 'HMMER3/f'. Previous H2/H3 formats also supported.`
Is there any solutions to this? Thanks.
Hi @didietkeren,
What version of HMMER are you using?
Thanks,
Gavin
Hi @didietkeren,
What version of HMMER are you using?
Thanks,
Gavin
I don't know yet about the version, I'll check it. It's from your github page (picrust2 standalone).
Hey @rozlynboutin,
Yes you can get predictions for ITS sequences now with the strong caveat that the predictions are much lower accuracy than for prokaryotes. You should check out the pre-print which describes the fungi validations: https://www.biorxiv.org/content/10.1101/672295v1. To me the fungal prediction results are a nice proof of concept that metagenome inference for eukaryotes is possible and that it will likely improve as more genomes become available. However, it's still unclear how useful these predictions are currently and how much useful signal there is above the noise... One reason that this is unclear that it's difficult to compare the predictions with a gold-standard of expected functions.
With that major caveat in mind though you can get the predictions with the standalone version of PICRUSt2 (see here: https://github.com/picrust/picrust2/wiki/Workflow), but not the QIIME2 plugin. You will need to point the appropriate options to the fungal ITS databases which are in
default_files/fungi
(of the github repository) for the sequence placement and hidden-state prediction steps.
@gavinmdouglas , thank you for this wonderful work. Based on what you said I only have the change this command line : place_seqs.py -s study_seqs.fna -o placed_seqs.tre -p 1 --intermediate placement_working to this way : place_seqs.py -s study_seqs.fna -o placed_seqs.tre -p 1 --intermediate placement_working --ref_dir path/to/picrust2/default_files/fungis
and then I only have to delete this command line : hsp.py -i 16S -t placed_seqs.tre -o marker_nsti_predicted.tsv.gz -p 1 -n and to only run these 2 : hsp.py -i EC -t placed_seqs.tre -o EC_predicted.tsv.gz -p 1
hsp.py -i KO -t placed_seqs.tre -o KO_predicted.tsv.gz -p 1
Are they other things to be modfied ? Thanks in advance!
Hi @gavinmdouglas,
I was wondering if I can run the PICRUSt2 steps with KEGG/KO options on my own soil ITS and 18S data when using the custom databases provided by picrust2 page on github. I have done it with EC/MetaCyc options which worked, but I would also like to look at what KEGG/KO can give me. Can you give me insight on if thats already possible with PICRUSt2?
Thanks in advance!
Hi!
Thanks for the great plugin on the qiime2 platform. I've heard that it is now possible to run picrust2 for ITS data, but was wondering how I can do this in qiime2 using the qiime picrust2 full-pipeline command? Thanks in advance!