wwood / singlem

Novelty-inclusive microbial community profiling of shotgun metagenomes
http://wwood.github.io/singlem/
GNU General Public License v3.0
124 stars 16 forks source link

SingleM error #21

Closed evensannesriiser closed 6 years ago

evensannesriiser commented 6 years ago

Dear Ben,

I've been running CheckM and RefineM on my metagenomic bins, and after a discussion with Donovan Parks regarding abundance estimation, he recommended your SingleM software. I think this looks very promising, however, I struggle a bit to get it "up and running". I've installed all dependencies, and verified that they work. Still, when I run the command:

singlem pipe --sequences $SINGLEM/input/allreads.fastq --otu_table $SINGLEM/otu_table.csv --threads 32

I get the following error:

Traceback (most recent call last):
  File "/usit/abel/u1/evensr/.local/bin/singlem", line 265, in <module>
    known_sequence_taxonomy = args.known_sequence_taxonomy)
  File "/usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/pipe.py", line 62, in run
    hmms = HmmDatabase(singlem_packages)
  File "/usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/singlem.py", line 40, in __init__
    pkg_paths = [d for d in os.listdir(db_directory) if d[-5:]=='.spkg']
OSError: [Errno 2] No such file or directory: '/cluster/home/evensr/.local/lib/python2.7/site-packages/singlem/../db'

(Note that "allreads.fastq" contains all paired forward, paired reverse and singleton reads, just concatenated (not interleaved etc.)).

As I read the error message, the issue seems to be some sort of missing database. I'm a bit confused about the prerequisites for running SingleM, i.e. do I need outputs from GraftM? Is my error message in some way related to a missing output of GraftM? Do you have any other suggestions to why I get this error?

Kind regards,

Even Sannes Riiser, PhD candidate, University of Oslo, Norway

wwood commented 6 years ago

Hi Even,

Thanks for trying SingleM. This is an issue with how SingleM itself was installed. I suppose you installed it using something like

pip install --local singlem

?

If so, I get the same issue (the bundled SingleM packages are not used). I am working on fixing this, but in the meantime as a workaround you can download SingleM from https://github.com/wwood/singlem/archive/v0.8.1.tar.gz and then just run it by adding the bin directory to your $PATH.

Again, apologies for this. There are multiple ways to install PyPI packages, and it seems only some of them work right now.

wwood commented 6 years ago

Hi again, I believe I've fixed it now, but would you be so kind as to test it out? Download this file and then pip install it, and then that error should go away https://www.dropbox.com/s/ruc5e4r8kh7be8m/singlem-0.8.2a1.tar.gz?dl=0

Thanks in advance.

evensannesriiser commented 6 years ago

Hi Ben,

Thanks for the feedback! I installed and verified the correct version (0.8.2a1), and the "db" problem is now solved. Nevertheless, I still run into problems using "singlem pipe". When running the following command:

singlem --debug pipe --sequences $SINGLEM/input/allreads.fastq --otu_table $SINGLEM/otu_table.csv --threads 32 1> singlem.out 2> singlem.err

I find the following in the stderr file (just an excerpt, see attachment for full output):

01/10/2018 03:15:05 PM DEBUG: Returning orfm nucleotides ATCAAAGTTGGTAATACATTACCAATGCGCAACATCCCTGTAGGTTCAACAGTACACTGTGTTGAACTTAAGCCTGGTAAAGGTGCACAGCTGGCTCGTTCAGCTGGCGCATACGCTCAGATC
01/10/2018 03:15:05 PM DEBUG: For sample allreads, spkg /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.07.ribosomal_protein_L2_rplB.gpkg.spkg, found 0 known and 2045 unknown OTU sequences
01/10/2018 03:15:05 PM INFO: Finished extracting aligned sequences
01/10/2018 03:15:05 PM INFO: Running taxonomic assignment with graftm..
01/10/2018 03:15:05 PM DEBUG: Running extern cmd: graftM graft --verbosity 5 --input_sequence_type nucleotide  --min_orf_length 96   --filter_minimum 28  --threads 32 --forward /dev/shm/tmplUpiI0/tempfiles/singlem.allreads.JtgM0R.fasta --graftm_package /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.08.ribosomal_protein_L3_rplC.gpkg.spkg/4.08.ribosomal_protein_L3_rplC --output_directory /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC --max_samples_for_krona 0 --assignment_method pplacer
Traceback (most recent call last):
  File "/usit/abel/u1/evensr/.local/bin/singlem", line 265, in <module>
    known_sequence_taxonomy = args.known_sequence_taxonomy)
  File "/usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/pipe.py", line 145, in run
    extracted_reads, graftm_assignment_method)
  File "/usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/pipe.py", line 590, in _assign_taxonomy
    extern.run_many(commands, num_threads=1)
  File "/usit/abel/u1/evensr/.local/lib/python2.7/site-packages/extern/__init__.py", line 75, in run_many
    return runner.run(commands, progress_stream=progress_stream)
  File "/usit/abel/u1/evensr/.local/lib/python2.7/site-packages/extern/multi_runner.py", line 89, in run
    raise extern.ExternCalledProcessError(*res[2:])
extern.ExternCalledProcessError: Command graftM graft --verbosity 5 --input_sequence_type nucleotide  --min_orf_length 96   --filter_minimum 28  --threads 32 --forward /dev/shm/tmplUpiI0/tempfiles/singlem.allreads.JtgM0R.fasta --graftm_package /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.08.ribosomal_protein_L3_rplC.gpkg.spkg/4.08.ribosomal_protein_L3_rplC --output_directory /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC --max_samples_for_krona 0 --assignment_method pplacer returned non-zero exit status 1.
STDERR was: 01/10/2018 03:15:09 PM DEBUG: Ran command: /usit/abel/u1/evensr/.local/bin/graftM graft --verbosity 5 --input_sequence_type nucleotide --min_orf_length 96 --filter_minimum 28 --threads 32 --forward /dev/shm/tmplUpiI0/tempfiles/singlem.allreads.JtgM0R.fasta --graftm_package /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.08.ribosomal_protein_L3_rplC.gpkg.spkg/4.08.ribosomal_protein_L3_rplC --output_directory /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC --max_samples_for_krona 0 --assignment_method pplacer
01/10/2018 03:15:09 PM DEBUG: Loading version 3 GraftM package: /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.08.ribosomal_protein_L3_rplC.gpkg.spkg/4.08.ribosomal_protein_L3_rplC
01/10/2018 03:15:09 PM DEBUG: Loading version 3 GraftM package: /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.08.ribosomal_protein_L3_rplC.gpkg.spkg/4.08.ribosomal_protein_L3_rplC

The problem seems to be related to the "graftM graft" command. This is a bit strange, as running "graftM --help" earlier in my script gives a perfectly fine output. In addition, by looking at the debug output file, it seems that the "graftM graft" command worked well earlier in the process. Do you have any suggestions to why the process fails at this point?

Full standard error output: singlem.txt

Best,

Even Sannes Riiser

wwood commented 6 years ago

Hi, good to see that 0.8.2a1 worked for you.

This new error seems unrelated - the root error looks to actually be an issue occurring within GraftM (or pplacer)

01/10/2018 03:15:22 PM DEBUG: Writing search otu table to file: /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC/search_otu_table.txt
01/10/2018 03:15:22 PM DEBUG: Clustering reads
01/10/2018 03:15:22 PM DEBUG: Found 2004 reads
01/10/2018 03:15:22 PM DEBUG: Clustered to 544 groups
01/10/2018 03:15:22 PM DEBUG: Writing representative sequences of each cluster to: /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC/singlem.allreads.JtgM0R/singlem.allreads.JtgM0R_clustered.fa
01/10/2018 03:15:22 PM INFO: Placing reads into phylogenetic tree
01/10/2018 03:15:22 PM DEBUG: Running extern cmd: pplacer -j 32 --verbosity 0 --out-dir /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC -c /usit/abel/u1/evensr/.local/lib/python2.7/site-packages/singlem/data/4.08.ribosomal_protein_L3_rplC.gpkg.spkg/4.08.ribosomal_protein_L3_rplC/2.08.ribosomal_protein_L3_rplC.gpkg.refpkg /dev/shm/tmplUpiI0/graftm_aligns/4.08.ribosomal_protein_L3_rplC/combined_alignment.aln.fa
01/10/2018 03:15:45 PM INFO: Placements finished
01/10/2018 03:15:45 PM INFO: Reading classifications
01/10/2018 03:15:45 PM WARNING: null placement encountered in group: D00564:80:CB1KKANXX:1:2201:12067:74163/1_1_4_2_0

Would it be possible to send me your input data? If you like you can email me at "b dot woodcroft at uq.edu.au" or upload it somewhere public.

To do so, probably the easiest way would be to run singlem like this

singlem --debug pipe --sequences $SINGLEM/input/allreads.fastq --otu_table $SINGLEM/otu_table.csv --threads 32 --working_directory singlem_working_directory 1> singlem.out 2> singlem.err

And then send the singlem_working_directory as a tar.gz. That will contain only those reads that have been picked up so you don't have to send all of your input reads.

Thanks.

In the meantime, as a workaround, you could use the diamond taxonomy assignment flag --assignment_method diamond to pipe, as the error seems to be specific to the pplacer method.

evensannesriiser commented 6 years ago

Hi Ben,

Thanks! I just sent you the link to my singlem_working_directory.tar.gz via email. Hope it contains what you requested. Let me know otherwise!

Will try the Diamond-strategy, and let you know how it goes!

Best,

Even

wwood commented 6 years ago

In discussions over email, the problem turned out to be an out of date pplacer installation.