vmaffei / dada2_to_picrust

Experimental pipeline to perform de novo PICRUSt on de-noised amplicon sequence variants (ASV)
19 stars 1 forks source link

predict_metagenomes.py error #8

Open eberdan opened 6 years ago

eberdan commented 6 years ago

Hi,

Thanks for this pipeline! I am running it and having trouble with the predict_metagenomes.py step. When I the normal predict_traits.py command you recommend and then add the KEGG I get the following error when I run predict_metagenomes.py:

Traceback (most recent call last): File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 375, in main() File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 185, in main ids_to_load=ids_to_load,verbose=opts.verbose,transpose=True) File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 140, in load_data_table genome_table = convert_precalc_to_biom(genome_table_fh,ids_to_load,transpose=transpose) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/picrust/util.py", line 84, in convert_precalc_to_biom row_meta[idx][row_id[len(md_prefix):]]=parse_metadata_field(fields[idx+1],metadata_type) IndexError: list index out of range

Someone on the picrust help site told me that the ko_precalculated.tab needs to be in biom format. I have tried to convert ko_precalculated.tab to biom using the biom convert command (ex: biom convert -i ko_precalculated_table.tab -o table.from_txt_json.biom --table-type="OTU table" --to-json) but that does not work. If I do that I get the error:

Traceback (most recent call last): File "/usr/local/packages/anaconda2/bin/biom", line 11, in sys.exit(cli()) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/click/core.py", line 535, in invoke return callback(args, **kwargs) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/biom/cli/table_converter.py", line 114, in convert table = load_table(input_fp) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/biom/parse.py", line 656, in load_table raise TypeError("%s does not appear to be a BIOM file!" % f) TypeError: ko_precalculated_table.tab does not appear to be a BIOM file!

I can run predict_traits.py using the --output_precalc_file_in_biom option and then get a biom file but then I can't cat the KEGG data and when running predict_metagenomes.py I get the error:

Traceback (most recent call last): File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 375, in main() File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 185, in main ids_to_load=ids_to_load,verbose=opts.verbose,transpose=True) File "/usr/local/packages/anaconda2/bin/predict_metagenomes.py", line 140, in load_data_table genome_table = convert_precalc_to_biom(genome_table_fh,ids_to_load,transpose=transpose) File "/usr/local/packages/anaconda2-2.5.0/lib/python2.7/site-packages/picrust/util.py", line 100, in convert_precalc_to_biom raise ValueError,"No OTUs match identifiers in precalculated file. PICRUSt requires an OTU table reference/closed picked against GreenGenes.\nExample of the first 5 OTU ids from your table: {0}".format(', '.join(list(ids_to_load)[:5])) ValueError: No OTUs match identifiers in precalculated file. PICRUSt requires an OTU table reference/closed picked against GreenGenes. Example of the first 5 OTU ids from your table: study_112, study_3027, study_3026, study_3025, study_3024

I am not sure how to proceed from here.

vmaffei commented 6 years ago

Hey eberdan, my first guess is that the ko_precalculated.tab file did not generate properly! Couple of questions:

If you run grep study ko_precalculated.tab do you get any hits ex: study_1, study_2, etc? Are you running anaconda on Mac or Linux? If it's not too much trouble, do you mind posting the code you ran from start to error? Happy to take a look for any potential issues.

eberdan commented 6 years ago

Hi,

I am running everything on linux. I checked both ko_precalculated.tab and my sample_counts file and both have study_x etc. Here is the first bit of my ko_precalculated (after changing it to tsv using biom convert)

Constructed from biom file

OTU ID study_2603 study_5636 study_3359 470925 study_8984

Here is the first bit of my sample_counts table:

{ "id": null, "format": "Biological Observation Matrix 1.0.0-dev", "format_url": "http://biom-format.org/documentation/format_versions/biom-1.0.html", "type": "OTU table", "generated_by": "biom 0.4.0", "date": "2017-11-08 12:54:31", "matrix_type": "dense", "matrix_element_type": "int", "shape": [ 10995, 95 ], "rows": [ { "id": "study_1", "metadata": null }, { "id": "study_2", "metadata": null }, { "id": "study_3", "metadata": null }, { "id": "study_4", "metadata": null }, { "id": "study_5", "metadata": null }, { "id": "study_6", "metadata": null }, { "id": "study_7", "metadata": null }, {

As for the code I used exactly what you posted except for predict traits I did not use the -g option because I kept getting the error message "argument list too long".

vmaffei commented 6 years ago

Thanks! I took a closer look at the first error you posted. It is occurring at the precalculated file KEGG_metadata parsing step in predict_metagenomes (from util.py). Would you mind gzipping and attaching your ko_precalculated.tab file (not the *.biom one)? I'll take a look and compare to some of mine that work with the metadata.

Edit: on second thought, that file will be huge without the -g option on predict_traits.py. Run the -l option instead when you get a chance:

# convert biom to tsv using biom-format
biom convert -i sample_counts.biom -o sample_counts.tab --to-tsv
# predict traits using -l in place of -g
predict_traits.py -i ./genome_prediction/format/KEGG/trait_table.tab \
   -t ./genome_prediction/format/KEGG/reference_tree.newick \
   -r ./genome_prediction/asr/KEGG_asr_counts.tab \
   -o ./genome_prediction/predict_traits/ko_precalculated.tab \
   -a -c ./genome_prediction/asr/asr_ci_KEGG.tab \
   -l sample_counts.tab
# add KEGG metadata
cat kegg_meta >> ./genome_prediction/predict_traits/ko_precalculated.tab

Followed by gzip ./genome_prediction/predict_traits/ko_precalculated.tab for attachment here

eberdan commented 6 years ago

Hi,

I ran the command as suggested. The file is attached here.

ko_precalculated.tab.gz

vmaffei commented 6 years ago

Thanks! So far everything thing looks fine in the attached file. Which version of biom-format and h5py are you running out of curiosity? :

# check biom-format version
pip list | grep biom-format
pip list | grep h5py

Have you run PICRUSt successfully in the past (prior to running this pipeline)?

eberdan commented 6 years ago

biom, version 2.1.6 h5py (2.5.0)

I have never tried running PICRUST before at all. I am doing it now on a linux cluster:

Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The software was installed on the cluster for me. Nobody else has used it.

vmaffei commented 6 years ago

I spoke too soon! There were in fact a few issues with the KEGG metadata in the attached precalculated file (none of which were errors on your part!).

Give this corrected file a try in predict_metagenomes.py: ko_precalculated.fix.tab.gz

Let me know if this works for you. If it does, I'll update the pipeline to fix this error in the future.

Thank you for your patience!

eberdan commented 6 years ago

It worked! Thank you so much!!!

vmaffei commented 6 years ago

No problem at all. The pipeline has been updated as well. Thanks again for bringing this to my attention!!