Open eberdan opened 6 years ago
Hey eberdan, my first guess is that the ko_precalculated.tab file did not generate properly! Couple of questions:
If you run grep study ko_precalculated.tab
do you get any hits ex: study_1, study_2, etc?
Are you running anaconda on Mac or Linux?
If it's not too much trouble, do you mind posting the code you ran from start to error? Happy to take a look for any potential issues.
Hi,
I am running everything on linux. I checked both ko_precalculated.tab and my sample_counts file and both have study_x etc. Here is the first bit of my ko_precalculated (after changing it to tsv using biom convert)
Here is the first bit of my sample_counts table:
{ "id": null, "format": "Biological Observation Matrix 1.0.0-dev", "format_url": "http://biom-format.org/documentation/format_versions/biom-1.0.html", "type": "OTU table", "generated_by": "biom 0.4.0", "date": "2017-11-08 12:54:31", "matrix_type": "dense", "matrix_element_type": "int", "shape": [ 10995, 95 ], "rows": [ { "id": "study_1", "metadata": null }, { "id": "study_2", "metadata": null }, { "id": "study_3", "metadata": null }, { "id": "study_4", "metadata": null }, { "id": "study_5", "metadata": null }, { "id": "study_6", "metadata": null }, { "id": "study_7", "metadata": null }, {
As for the code I used exactly what you posted except for predict traits I did not use the -g option because I kept getting the error message "argument list too long".
Thanks! I took a closer look at the first error you posted. It is occurring at the precalculated file KEGG_metadata parsing step in predict_metagenomes (from util.py). Would you mind gzipping and attaching your ko_precalculated.tab file (not the *.biom one)? I'll take a look and compare to some of mine that work with the metadata.
Edit: on second thought, that file will be huge without the -g option on predict_traits.py. Run the -l option instead when you get a chance:
# convert biom to tsv using biom-format
biom convert -i sample_counts.biom -o sample_counts.tab --to-tsv
# predict traits using -l in place of -g
predict_traits.py -i ./genome_prediction/format/KEGG/trait_table.tab \
-t ./genome_prediction/format/KEGG/reference_tree.newick \
-r ./genome_prediction/asr/KEGG_asr_counts.tab \
-o ./genome_prediction/predict_traits/ko_precalculated.tab \
-a -c ./genome_prediction/asr/asr_ci_KEGG.tab \
-l sample_counts.tab
# add KEGG metadata
cat kegg_meta >> ./genome_prediction/predict_traits/ko_precalculated.tab
Followed by gzip ./genome_prediction/predict_traits/ko_precalculated.tab
for attachment here
Thanks! So far everything thing looks fine in the attached file. Which version of biom-format and h5py are you running out of curiosity? :
# check biom-format version
pip list | grep biom-format
pip list | grep h5py
Have you run PICRUSt successfully in the past (prior to running this pipeline)?
biom, version 2.1.6 h5py (2.5.0)
I have never tried running PICRUST before at all. I am doing it now on a linux cluster:
Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
The software was installed on the cluster for me. Nobody else has used it.
I spoke too soon! There were in fact a few issues with the KEGG metadata in the attached precalculated file (none of which were errors on your part!).
Give this corrected file a try in predict_metagenomes.py
: ko_precalculated.fix.tab.gz
Let me know if this works for you. If it does, I'll update the pipeline to fix this error in the future.
Thank you for your patience!
It worked! Thank you so much!!!
No problem at all. The pipeline has been updated as well. Thanks again for bringing this to my attention!!
Hi,
Thanks for this pipeline! I am running it and having trouble with the predict_metagenomes.py step. When I the normal predict_traits.py command you recommend and then add the KEGG I get the following error when I run predict_metagenomes.py:
Traceback (most recent call last): File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 375, in
main()
File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 185, in main
ids_to_load=ids_to_load,verbose=opts.verbose,transpose=True)
File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 140, in load_data_table
genome_table = convert_precalc_to_biom(genome_table_fh,ids_to_load,transpose=transpose)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/picrust/util.py", line 84, in convert_precalc_to_biom
row_meta[idx][row_id[len(md_prefix):]]=parse_metadata_field(fields[idx+1],metadata_type)
IndexError: list index out of range
Someone on the picrust help site told me that the ko_precalculated.tab needs to be in biom format. I have tried to convert ko_precalculated.tab to biom using the biom convert command (ex: biom convert -i ko_precalculated_table.tab -o table.from_txt_json.biom --table-type="OTU table" --to-json) but that does not work. If I do that I get the error:
Traceback (most recent call last): File "/usr/local/packages/anaconda2/bin/biom", line 11, in
sys.exit(cli())
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(args, kwargs)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(args, **kwargs)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/biom/cli/table_converter.py", line 114, in convert
table = load_table(input_fp)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/biom/parse.py", line 656, in load_table
raise TypeError("%s does not appear to be a BIOM file!" % f)
TypeError: ko_precalculated_table.tab does not appear to be a BIOM file!
I can run predict_traits.py using the --output_precalc_file_in_biom option and then get a biom file but then I can't cat the KEGG data and when running predict_metagenomes.py I get the error:
Traceback (most recent call last): File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 375, in
main()
File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 185, in main
ids_to_load=ids_to_load,verbose=opts.verbose,transpose=True)
File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 140, in load_data_table
genome_table = convert_precalc_to_biom(genome_table_fh,ids_to_load,transpose=transpose)
File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/picrust/util.py", line 100, in convert_precalc_to_biom
raise ValueError,"No OTUs match identifiers in precalculated file. PICRUSt requires an OTU table reference/closed picked against GreenGenes.\nExample of the first 5 OTU ids from your table: {0}".format(', '.join(list(ids_to_load)[:5]))
ValueError: No OTUs match identifiers in precalculated file. PICRUSt requires an OTU table reference/closed picked against GreenGenes.
Example of the first 5 OTU ids from your table: study_112, study_3027, study_3026, study_3025, study_3024
I am not sure how to proceed from here.