picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2
GNU General Public License v3.0
329 stars 104 forks source link

place_seqs.py fails when using my fasta file #75

Closed rhizorick closed 5 years ago

rhizorick commented 5 years ago

Hello, I have picrust2.1.0-b, pandas 0.23.4, and python version 3.7.1 installed. I'm trying to run place_seqs.py and the process fails with the following error when I use my otu.fasta file, but not the tutorial one. Though, the process has never actually finished on the tutorial fasta either. Below is the error I get. Help is much appreciated. Thanks :)

Error running this command: hmmalign --trim --dna --mapali /home/rick/Desktop/picrust2-2.1.0-b/picrust2/default_files/prokaryotic/pro_ref/pro_ref.fna --informat FASTA -o placement_working/query_align.stockholm /home/rick/Desktop/picrust2-2.1.0-b/picrust2/default_files/prokaryotic/pro_ref/pro_ref.hmm /media/rick/DATADRIVE1/Rick/WSU Projects/Biosolid Studies_Nate/2017_7_3_062317NS515F_fasta-qual-mapping/fasta-qual-mapping-files/Individual_Samples_Files/openref/otus.fna

STDOUT of failed command:

ERROR: Incorrect number of command line arguments. Usage: hmmalign [-options]

gavinmdouglas commented 5 years ago

Hi @rhizorick ,

Could you try re-installing the latest version of PICRUSt2 (v2.1.2-b) with bioconda (see: https://github.com/picrust/picrust2/wiki/Installation#install-from-bioconda) and see if that fixes the problem? There seems to be a version incompatibility with hmmer, which shouldn't be happening if the tool is installed in a fresh conda environment.

Best,

Gavin

rhizorick commented 5 years ago

Hello, I tried this, but still have not been able to get PICRUSt2 to successfully run my data or the tutorial data. Below is the error I receive. Thoughts?

(picrust2) bioinfo1@Bioinfo1:~$ picrust2_pipeline.py -s '/home/bioinfo1/Desktop/tutorial_seqs.fna' -i '/home/bioinfo1/Desktop/tutorial.biom' -o picrusttest --threads 4

Error running this command: place_seqs.py --study_fasta /home/bioinfo1/Desktop/tutorial_seqs.fna --ref_dir /home/bioinfo1/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref --out_tree picrusttest/out.tre --threads 4 --intermediate picrusttest/intermediate/place_seqs --chunk_size 5000

Standard output of failed command: ""

Standard error of failed command: " Error running this command: epa-ng --tree /home/bioinfo1/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.tre --ref-msa picrusttest/intermediate/place_seqs/ref_seqs_hmmalign.fasta --query picrusttest/intermediate/place_seqs/study_seqs_hmmalign.fasta --chunk-size 5000 -T 4 -m /home/bioinfo1/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.model -w picrusttest/intermediate/place_seqs/epa_out --filter-acc-lwr 0.5 --filter-max 20

Standard output of failed command: "INFO Selected: Output dir: picrusttest/intermediate/place_seqs/epa_out/ INFO Selected: Query file: picrusttest/intermediate/place_seqs/study_seqs_hmmalign.fasta INFO Selected: Tree file: /home/bioinfo1/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.tre INFO Selected: Reference MSA: picrusttest/intermediate/place_seqs/ref_seqs_hmmalign.fasta INFO Selected: Filtering by accumulated threshold: 0.5 INFO Selected: Maximum number of placements per query: 20 INFO Selected: Automatic switching of use of per rate scalers INFO Selected: Preserving the root of the input tree INFO Selected: Specified model file: /home/bioinfo1/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/proref.model INFO Rate heterogeneity: GAMMA (4 cats, mean), alpha: 0.453141 (user), weights&rates: (0.25,0.0250674) (0.25,0.220229) (0.25,0.782933) (0.25,2.97177)
Base frequencies (user): 0.229585 0.22008 0.298596 0.251739
Substitution rates (user): 1.00319 2.79077 1.5301 0.87441 3.83966 1 INFO Selected: Reading queries in chunks of: 5000 INFO Selected: Using threads: 4 INFO __ __
____ / __// \ / | / | / // __/ / / / // // /| | __ / |/ // /
/ /_ / __// _
|/// /| // // /
/____/// // || // |/ \
/ (v0.3.5) INFO Output file: picrusttest/intermediate/place_seqs/epa_out/epa_result.jplace "

Standard error of failed command: "" "

gavinmdouglas commented 5 years ago

Hmm what operating system are you running and how much memory does your system have?

shlhgsh1127 commented 5 years ago

I actually have the similar problem. can you advise how to fix this? Many thanks!

Error running this command: place_seqs.py --study_fasta otus_resampled.fna --ref_dir /home/g/anaconda2/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref --out_tree picrust2/out.tre --threads 1 --intermediate picrust2/intermediate/place_seqs --chunk_size 5000

Standard output of failed command: ""

Standard error of failed command: " Error running this command: epa-ng --tree /home/g/anaconda2/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.tre --ref-msa picrust2/intermediate/place_seqs/ref_seqs_hmmalign.fasta --query picrust2/intermediate/place_seqs/study_seqs_hmmalign.fasta --chunk-size 5000 -T 1 -m /home/g/anaconda2/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.model -w picrust2/intermediate/place_seqs/epa_out --filter-acc-lwr 0.5 --filter-max 20

Standard output of failed command: "INFO Selected: Output dir: picrust2/intermediate/place_seqs/epa_out/ INFO Selected: Query file: picrust2/intermediate/place_seqs/study_seqs_hmmalign.fasta INFO Selected: Tree file: /home/g/anaconda2/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.tre INFO Selected: Reference MSA: picrust2/intermediate/place_seqs/ref_seqs_hmmalign.fasta INFO Selected: Filtering by accumulated threshold: 0.5 INFO Selected: Maximum number of placements per query: 20 INFO Selected: Automatic switching of use of per rate scalers INFO Selected: Preserving the root of the input tree INFO Selected: Specified model file: /home/g/anaconda2/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/proref.model INFO Rate heterogeneity: GAMMA (4 cats, mean), alpha: 0.453141 (user), weights&rates: (0.25,0.0250674) (0.25,0.220229) (0.25,0.782933) (0.25,2.97177) Base frequencies (user): 0.229585 0.22008 0.298596 0.251739 Substitution rates (user): 1.00319 2.79077 1.5301 0.87441 3.83966 1 INFO Selected: Reading queries in chunks of: 5000 INFO Selected: Using threads: 1 INFO __ __ ____ / __// \ / | / | / // __/ / / / // // /| | __ / |/ // /
/ /_ / __// _
|/// /| // // /
/____/// // || // |/ \
/ (v0.3.5) INFO Output file: picrust2/intermediate/place_seqs/epa_out/epa_result.jplace "

Standard error of failed command: "" "

Cheers, Shuhong

shlhgsh1127 commented 5 years ago

I am using Ubuntu working on this and the memory should be 8 Gb I suppose. sorry I am new to PICRUST2.

Thanks again!

Cheers, Shuhong

rhizorick commented 5 years ago

I'm using Ubuntu 14.04 with 4 gigs of ram.

gavinmdouglas commented 5 years ago

Hi @shlhgsh1127 and @rhizorick - I think the likely issue is that you don't have enough RAM. Typically you would need at least 16 GB of RAM to run a dataset with PICRUSt2. You can take a look at using SEPP instead of the default place_seqs.py pipeline if you use the QIIME2 plugin, which requires less RAM (although 4 GB might not be enough): https://github.com/picrust/picrust2/wiki/q2-picrust2-Tutorial

shlhgsh1127 commented 5 years ago

Thank you very much Gavin for your timely reply! I have tried to use the server in out lab which have much more RAM and now my dataset has been run through.

Many thanks again and wish you all the best!

Cheers, Shuhong

Francy5819 commented 5 years ago

Hello, I’ve installed picrust2, from the tutorial I downloaded the three files: metadata.tsv, seqs.fna, table.biom but when I run these steps with this command: place_seqs.py -s ../seqs.fna -o out.tre -p 1 - intermediate intermediate / place_seqs it displays this:

Error running this command: hmmalign --trim --dna --mapali /home/francesca/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.fna.gz --informat FASTA -o intermediate/place_seqs/query_align.stockholm /home/francesca/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.hmm ../seqs.fna

Standard output of failed command: ""

Standard error of failed command: " Error: Failed to open sequence file ../seqs.fna for reading

"

Can someone help me? Thank you very much!

gavinmdouglas commented 5 years ago

Hey @Francy5819,

Could you post the first few lines of your FASTA? Have you compared your file to the tutorial file (see here)?

najouamghazli commented 5 years ago

Hello,

I encountered the same problem when running picrust2 on my own dataset.

I ran the following command:

(picrust2) najoua@vidourle:~/ciseaux/Analyses/PICRUSt$ picrust2_pipeline.py -s PicrustFasta.fasta -i PicrustBiom.biom -o Output_Picrust -p 1

And got this error :

Error running this command:
place_seqs.py --study_fasta PicrustFasta.fasta --ref_dir /home/najoua/miniconda2/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref --out_tree Output_Picrust/out.tre --processes 1 --intermediate Output_Picrust/intermediate/place_seqs --chunk_size 5000

Standard output of failed command:
""
Standard error of failed command:
"Traceback (most recent call last):
  File "/home/najoua/.local/bin/place_seqs.py", line 8, in <module>
    from picrust2.place_seqs import place_seqs_pipeline
  File "/home/najoua/.local/lib/python2.7/site-packages/picrust2/place_seqs.py", line 65
    def run_papara(tree: str, ref_msa: dict, study_fasta: str, out_dir: str,
                       ^
SyntaxError: invalid syntax
"

I am using Ubuntu 18.4 on a server machine of 96Gb, and python 3

(picrust2) najoua@vidourle:~/ciseaux/Analyses/PICRUSt$ python --version
Python 3.6.7
(picrust2) najoua@vidourle:~/ciseaux/Analyses/PICRUSt$ head PicrustFasta.fasta
>Cluster1
TACAGAGGGTGCGAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGCAGGCGGCCTTTTAAGTCTCACCGTTCAAGCCCGGAGCTCAACTCCGGTCCGCGGGGGATACTGGGAGGCTAGGGACATGTAGGGGCGAGCGGAATTCCGGGTGTAGCGGTGGAATGCGTAGAGATCCGGAAGAACACCGGGGGCGAAGGCGGCTCGCTGGGCATGGTCCGACGCTGAGGCGCGAAAGCGTGGGGAGCGAACAGG
>Cluster2
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTTGTAAGTCTGACGTGAAAGCCCCGGGCTCAACCTGGGAATTGCGTTGGAGACTGCAAGGCTTGAATCTGGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAAGATTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGG
>Cluster3
TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGCTTTGTAAGTTAGAGGTGAAAGCCCGGGGCTCAACTCCGGAACTGCCTTTAAGACTGCATCGCTTGAATCCAGGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGGCTCACTGGACTGGTATTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
(picrust2) najoua@vidourle:~/ciseaux/Analyses/PICRUSt$ biom head -i PicrustBiom.biom
# Constructed from biom file
#OTU ID 35      36      37      38      39
Cluster1        2666.0  1760.0  30045.0 39546.0 11013.0
Cluster2        54227.0 59778.0 32495.0 41481.0 11566.0
Cluster3        22105.0 16146.0 23385.0 20337.0 10941.0

Best,

Najoua

gavinmdouglas commented 5 years ago

Hi @najouamghazli,

In your error message it looks like picrust2 is installed in your conda root directory (i.e. not in a subfolder representing the environment "picrust2"):

  File "/home/najoua/.local/bin/place_seqs.py", line 8, in <module>

Python 2.7 is installed in this environment, which is why you're getting that error:

  File "/home/najoua/.local/lib/python2.7/site-packages/picrust2/place_seqs.py", line 65

I think there must have been an error when installing the tool and/or there are issues with your conda configuration.

Best,

Gavin

najouamghazli commented 5 years ago

Hi @gavinmdouglas,

I did reinstall PICRUSt from source and it's now working just fine. Thank you so much for your help.

Regards,

Najoua

PFRoux commented 5 years ago

Hi !

In case it can be useful to other Picrust2 users, I had the exact same error using the v2.2.0, on a MacBook 2019 Pro 2.6 GHz Intel Core i7 16 Go 2400 MHz DDR4.

I figured out I was to greedy when running processes in parallel. I lowered down the -p argument and it worked smoothly.

Cheers !

Peu

cindynino commented 4 years ago

Hello, I have picrust2.1.0-b installed. I'm trying to run picrust2_pipeline.py -s cin_final_final_este_si.fna -i table.from_txt.biom -o picrust2_out_pipeline -p 1 and the process fails with the following error when I use my fasta file, but not the tutorial one. Below is the error I get. Help is much appreciated. Thanks :)

Error running this command: place_seqs.py --study_fasta cin_final_final_este_si.fna --ref_dir /opt/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref --out_tree picrust2_out_pipeline/out.tre --processes 1 --intermediate picrust2_out_pipeline/intermediate/place_seqs --min_align 0.8 --chunk_size 5000

Standard error of the above failed command:

Error running this command: hmmalign --trim --dna --mapali /opt/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.fna.gz --informat FASTA -o picrust2_out_pipeline/intermediate/place_seqs/query_align.stockholm /opt/anaconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/prokaryotic/pro_ref/pro_ref.hmm cin_final_final_este_si.fna