vmaffei / dada2_to_picrust

Experimental pipeline to perform de novo PICRUSt on de-noised amplicon sequence variants (ASV)
19 stars 1 forks source link

format_tree_and_trait_table.py RuntimeError: Delimiter ' ' not in line #4

Open josemseoane opened 7 years ago

josemseoane commented 7 years ago

Hi @vmaffei !

I am experiencing the following problem when formatting the tree in a FEDORA system with a fully working installation of qiime and picrust 1.1.1. when I run:

format_tree_and_trait_table.py -t ./genome_prediction/study_tree.tree -i /home/jmseoane/bioinfo/dada2picRustTabs/gg_ko_counts.tab -o ./genome_prediction/format/KEGG/ I get

2525968 internal_node_2525968 True 2525969 661200 True 2525970 969420 True Traceback (most recent call last): File "/usr/bin/format_tree_and_trait_table.py", line 276, in main() File "/usr/bin/format_tree_and_trait_table.py", line 181, in main verbose=opts.verbose) File "/usr/lib/python2.7/site-packages/picrust/format_tree_and_trait_table.py", line 185, in reformat_tree_and_trait_table trait_table_fields,delimiter=input_trait_table_delimiter) File "/usr/lib/python2.7/site-packages/picrust/format_tree_and_trait_table.py", line 472, in filter_table_by_presence_in_tree for fields in trait_table_fields: File "/usr/lib/python2.7/site-packages/picrust/format_tree_and_trait_table.py", line 580, in convert_trait_table_entries for fields in trait_table_fields: File "/usr/lib/python2.7/site-packages/picrust/parse.py", line 94, in yield_trait_table_fields ",".join(possible_delimiters))) RuntimeError: Delimiter ' ' not in line. The following delimiters were found: tab. Is the correct delimiter one of these?

I have tried to fix it by specifying the delimiter as follows:

format_tree_and_trait_table.py -t ./genome_prediction/study_tree.tree --input_table_delimiter 'tab' -i /home/jmseoane/bioinfo/dada2picRustTabs/gg_ko_counts.tab -o ./genome_prediction/format/KEGG/

Then I get:

RuntimeError: Delimiter ' ' not in line. The following delimiters were found: tab,space. Is the correct delimiter one of these?

Something must be wrong with my tree (32.1 MB ) even though I followed all the previous steps without any issues at all. Do you have any suggestion on what could be wrong?

Thank you very much in advance for any help!

Best regards,

Jose

vmaffei commented 7 years ago

Hey @josemseoane ! A couple of things come to mind. Generally, I get that error whenever this step in Part 0 fails:

sed -i '/^\s*$/d' gg_ko_counts.tab

Unfortunately, this command doesn't tell you whether or not it succeeds, so it's easy to overlook when troubleshooting. This step removes rogue whitespace in the original ko_13_5_precalculated.tab.gz file that causes a Delimiter error in the format_tree_and_trait_table.py step. Try rerunning

sed -i '/^\s*$/d' gg_ko_counts.tab

once more on your gg_ko_counts.tab file (or consider redoing Part 0 altogether) and then repeat:

format_tree_and_trait_table.py -t ./genome_prediction/study_tree.tree -i /home/jmseoane/bioinfo/dada2picRustTabs/gg_ko_counts.tab -o ./genome_prediction/format/KEGG/

If that doesn't work, check whether you're running GNU sed or another sed version. You can check by running:

sed

and scrolling to the bottom of the documentation that pops up. You should see something like

GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.

The sed step above as written works for GNU sed, but it may not for other sed versions.

Lastly, the pipeline was written using PICRUSt 1.0.0 and hasn't been tested on the latest version 1.1.1. Let me know whether any of the above suggestions work for you. If you're still having trouble, I'll run 1.1.1 to see if I get the same error!

josemseoane commented 7 years ago

Great! Thank you very much for your rapid and detailed answer @vmaffei! I won´t be able to give it a try until next week but I will let you how it went as soon as I can put my hands back on it!

josemseoane commented 7 years ago

Hi @vmaffei, You were right, the error was fixed just by repeating step 0. However, now I am getting into trouble when executing the ancestral_state_reconstruction.py script which, as you know, uses the outputs from the previous steps. I suspect this is because I am running out of RAM (I have 64 gb and the computer is using all of it). What do you think? This is what I got:

[jmseoane@tetox 515fb_primers]$ ancestral_state_reconstruction.py -i ./genome_prediction/format/KEGG/trait_table.tab -t ./genome_prediction/format/KEGG/pruned_tree.newick -o ./genome_prediction/asr/KEGG_asr_counts.tab -c ./genome_prediction/asr/asr_ci_KEGG.tabme_prediction/format/KEGG/trait_table.tab -t ./genome_prediction/format/KEGG/pruned_tree.newick -o ./genome_prsh: line 1: 2669 Killed R -f /usr/lib/python2.7/site-packages/picrust/support_files/R/ace.R --args "./genome_prediction/format/KEGG/pruned_tree.newick" "./genome_prediction/format/KEGG/trait_table.tab" pic /tmp/tmp6O913p5Gbccim2L3V1Ay.txt /tmp/tmpmRUgpPuUMU7QRgd1KUPr.txt > "/tmp/tmpYhMAoDBNmvsiU47nVIsV.txt" 2> "/tmp/tmpBWQzGgWK95i2ge9nIy59.txt" Traceback (most recent call last): File "/usr/bin/ancestral_state_reconstruction.py", line 91, in main() File "/usr/bin/ancestral_state_reconstruction.py", line 74, in main asr_table,ci_table = ace_for_picrust(opts.input_tree_fp,opts.input_trait_table_fp,'pic',HALT_EXEC=opts.debug) File "/usr/lib/python2.7/site-packages/picrust/ace.py", line 95, in ace_for_picrust " %s" % "\n".join(result["StdErr"].readlines())) RuntimeError: R reported an error on stderr: [Previously saved workspace restored]

Thanks a lot!

Jose

vmaffei commented 7 years ago

Hey @josemseoane ! by chance do you have access to a machine with more RAM to test? I just ran the same script on a study with 1500+ sequences and RAM never peaked above 48 Gb...is it possible there are other processes you could close to free up some space? How many denoised sequences are you running?