Closed bfoster-lbl closed 1 year ago
Hi @bfoster-lbl thanks for the report. I think the short answer here is that you are using a version that is a bit old.
I guess you are using a docker version? We have an updated one with GTDB r202 annotations right now, and anticipate having an r207 version (and a better method for assigning taxonomy to OTUs) within the next few weeks.
How would you like to proceed?
We are currently using docker image wwood/singlem:0.13.2-dev11.a6cc1b4. Is there a docker image or github release for a current stable version? A docker image we can pull from dockerhub would be preferable. We are trying to run this on all the metagenome datasets we generate, several thousand per year but need fewer failures for it to be implemented in production.
Hi, unfortunately there is no stable release yet. We anticipate having this within the next month or two, but we are wanting to make some changes to the way OTUs are assigned taxonomy first (this will finally merge ~350 commits from the dev branch into main).
There is an updated docker image at public.ecr.aws/m5a0r7u5/singlem-wdl:0.13.2-dev37.e97d171 which I believe is publicly available. We have used that image extensively on thousands of public metagenomes. However, there is a known performance bug in it where the taxonomy assignment step takes longer (triple the time?) than it should. If you want to use this let me know and I can provide an example command line invocation.
The specific error you are seeing here is new to me, btw, but looks like an error with a fastx file being unexpectedly truncated. Hopefully an updated version will make that go away though.
Really happy to see this still in testing at JGI - apologies things havne't stablised as quickly as hoped.
Hi Ben, Is there any progress for the new release?
Hi Brian,
It has taken a bit longer than anticipated sorry, but we are still working on this. We have updated r207 reference data and have mostly finished the new algorithm dev - now just gathering the pieces together to make it usable for others and getting tests to pass etc. Hope to have a beta release by the end of this week. After ~500 commits, will be momentous to merge dev back into main...
I wonder if would make sense to catch up quickly over zoom after that to discuss what specifically you are looking for out of this tool? It addresses a few related problems.
Thanks, ben
Hi Ben, Briefly we'd like to run singlem for taxonomic analysis on all metagenomic datasets generated at jgi, so approximately 3,000 datasets a year. I believe Simon Roux has been in contact with you about this previously. Simon, Brian and I all work together and are trying to achieve the same same goal. Let us know if you'd like to set up a meeting to discuss further. Thanks, Alicia
Hi Alicia,
OK, makes sense. That scale shouldn't be a problem - the tool is not particularly RAM or CPU intensive and we've run at larger scales already.
There are some perhaps more advanced use cases that might be of interest e.g. Relating recovered genomes to raw reads to see how many/which were assembled/binned, or updating when new taxonomic reference data emerge, estimates of microbial alpha diversity etc. but maybe we can leave that discussion for after you've had a chance to test out the taxonomic profiling.
Be in touch about a docker you can try out.
ben
Hi Ben, we are currently using the container with label "wwood/singlem:0.13.2-dev11.a6cc1b4" is there a newer version?
On Tue, Jul 19, 2022 at 12:57 PM Ben J Woodcroft @.***> wrote:
Hi @bfoster-lbl https://github.com/bfoster-lbl thanks for the report. I think the short answer here is that you are using a version that is a bit old.
I guess you are using a docker version? We have an updated one with GTDB r202 annotations right now, and anticipate having an r207 version (and a better method for assigning taxonomy to OTUs) within the next few weeks.
How would you like to proceed?
— Reply to this email directly, view it on GitHub https://github.com/wwood/singlem/issues/90#issuecomment-1189494537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5AZAU3FZGNUMSCERNWYPTVU4CCVANCNFSM54AUWLOQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @bfoster-lbl @aclum
I just pushed 1.0.0beta1 to a GitHub tag and docker
docker pull wwood/singlem:1.0.0beta1
There is also some doco at https://wwood.github.io/singlem/
Let me know how you go - certainly interested in this use-case. I don't know of any major bugs in it right now, let me know if you come across any. Nearing a 1.0.0 release but since that was a merge of 500+ commits into main just taking it slow.
Actually, just found a small one - --full-help
doesn't work in the docker because man isn't installed. You can get that same info from the online doco https://wwood.github.io/singlem/ though
Thanks for your patience.
ben
Hi again,
That small bug is now fixed, and a new docker is available at wwood/singlem:1.0.0beta2
I'm going to close this issue for now, since it is (presumably) fixed in this new version. If not, or if you encounter other issues, let me know.
Thanks, ben
Hi Ben, Is singlem production ready? Is there a non-beta version? Thanks, Brian
On Sun, Oct 16, 2022 at 5:39 PM Ben J Woodcroft @.***> wrote:
Closed #90 https://github.com/wwood/singlem/issues/90 as completed.
— Reply to this email directly, view it on GitHub https://github.com/wwood/singlem/issues/90#event-7597300709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5AZAU2KCF2N54ASDIYWSTWDSN45ANCNFSM54AUWLOQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi Brian, A difficult question to ask a mere bioinformatician!
I would say that the pipe
subcommand, which takes an input metagenome and spits out a taxonomic profile, is in a good place. The pipe mode inside docker image is even tested before pushing to dockerhub.
Some of the other subcommands e.g. supplement
, which was just introduced in beta7, are ready for outside testing and do fine in my hands, but have UI issues that need to be fixed (e.g. not all the dependencies for that mode are included in the docker).
I have a draft of the paper that I'm putting finishing touches to, and SingleM does very well in the benchmarking, particularly when the species aren't currently represented in the reference database, which is a situation I imagine is seen very often at JGI. I intend releasing 1.0 non-beta when I push it to biorxiv, if not earlier.
HTH - of course feedback welcome.
Hi @bfoster-lbl @aclum Version v0.16.0 I would consider stable. There is a biorxiv now too - https://www.biorxiv.org/content/10.1101/2024.01.30.578060v1
That isn't to say there won't be issues and changes around the fringes, but the main workflow is set now. Please feel free to raise further issues or get in touch directly if helpful.
Thanks! ... I will check it out.
On Tue, Mar 5, 2024 at 6:42 PM Ben J Woodcroft @.***> wrote:
Hi @bfoster-lbl https://github.com/bfoster-lbl @aclum https://github.com/aclum Version v0.16.0 I would consider stable. There is a biorxiv now too - https://www.biorxiv.org/content/10.1101/2024.01.30.578060v1
That isn't to say there won't be issues and changes around the fringes, but the main workflow is set now. Please feel free to raise further issues or get in touch directly if helpful.
— Reply to this email directly, view it on GitHub https://github.com/wwood/singlem/issues/90#issuecomment-1979976875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5AZAQJUJ6S6YZZ6AG26KTYWZ7AZAVCNFSM54AUWLO2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXHE4TONRYG42Q . You are receiving this because you were mentioned.Message ID: @.***>
Hi I am running into issues where I get about 30% failure rate with the error found in the subject. These are re-run successfully. I am wondering if you have experience with this type of error?
I am running at aws and here is the error file:
07/18/2022 09:52:45 PM INFO: SingleM v0.13.2-dev11.a6cc1b4 07/18/2022 09:52:51 PM INFO: Loaded 83 SingleM packages 07/18/2022 09:53:00 PM INFO: Using as input 1 different pairs of sequence files e.g. /cromwell_root/bf-20190529-uswest2-s3/cromwell-execution/run_singlem/582ffe20-7b3d-43de-81a8-2cc0df1f93ff/call-split_reads/r1.fastq & /cromwell_root/bf-20190529-uswest2-s3/cromwell-execution/run_singlem/582ffe20-7b3d-43de-81a8-2cc0df1f93ff/call-split_reads/r2.fastq 07/18/2022 09:53:00 PM INFO: Filtering sequence files through DIAMOND blastx 07/18/2022 09:55:13 PM INFO: Finished DIAMOND prefilter phase 07/18/2022 09:55:13 PM INFO: Assigning sequences to SingleM packages with HMMSEARCH .. 07/18/2022 09:55:13 PM INFO: Searching with 83 SingleM package(s) 07/18/2022 09:55:13 PM INFO: Searching for reads matching 102 different protein HMM(s) Traceback (most recent call last): File "/singlem/bin/singlem", line 584, in
diamond_taxonomy_assignment_performance_parameters = args.diamond_taxonomy_assignment_performance_parameters)
File "/singlem/bin/../singlem/pipe.py", line 55, in run
otu_table_object = self.run_to_otu_table(**kwargs)
File "/singlem/bin/../singlem/pipe.py", line 267, in run_to_otu_table
known_taxes, known_otu_tables, include_inserts)
File "/singlem/bin/../singlem/pipe.py", line 325, in _find_and_extract_reads_by_hmmsearch
search_result = self._search(hmms, forward_read_files, reverse_read_files)
File "/singlem/bin/../singlem/pipe.py", line 847, in _search
run(hmms, graftm_protein_search_directory, True)
File "/singlem/bin/../singlem/pipe.py", line 835, in run
extern.run(cmd)
File "/opt/conda/envs/env/lib/python3.6/site-packages/extern/init.py", line 41, in run
raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command graftM graft --verbosity 2 --input_sequence_type nucleotide --min_orf_length 96 --filter_minimum 28 --threads 8 --forward /cromwell_root/bf-20190529-uswest2-s3/cromwell-execution/run_singlem/582ffe20-7b3d-43de-81a8-2cc0df1f93ff/call-singlem/tmp.8249a9e0/tmp7fqmo736/prefilter_forward/r1.fna --search_only --search_hmm_files /pkgs/S2.1.ribo
...
returned non-zero exit status 1. STDERR was: b'Traceback (most recent call last):\n File "/opt/conda/envs/env/bin/graftM", line 415, in\n Run(args).main()\n File "/opt/conda/envs/env/lib/python3.6/site-packages/graftm/run.py", line 613, in main\n self.graft()\n File "/opt/conda/envs/env/lib/python3.6/site-packages/graftm/run.py", line 388, in graft\n diamond_db\n File "/opt/conda/envs/env/lib/python3.6/site-packages/graftm/timeit.py", line 10, in timed\n result = method(*args, **kw)\n File "/opt/conda/envs/env/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 851, in aa_db_search\n hit_reads_orfs_fasta)\n File "/opt/conda/envs/env/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 943, in search_and_extract_orfs_matching_protein_database\n hits\n File "/opt/conda/envs/env/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 534, in _extract_from_raw_reads\n extern.run(extract_cmd, stdin=\'\n\'.join(input_reads))\n File "/opt/conda/envs/env/lib/python3.6/site-packages/extern/init.py", line 41, in run\n raise ExternCalledProcessError(process, command)\nextern.ExternCalledProcessError: Command mfqe --output-uncompressed --fasta-read-name-lists /dev/stdin --input-fasta <(cat \'/cromwell_root/bf-20190529-uswest2-s3/cromwell-execution/run_singlem/582ffe20-7b3d-43de-81a8-2cc0df1f93ff/call-singlem/tmp.8249a9e0/tmp7fqmo736/prefilter_reverse/r2.fna\') --output-fasta-files \'/cromwell_root/bf-20190529-uswest2-s3/cromwell-execution/run_singlem/582ffe20-7b3d-43de-81a8-2cc0df1f93ff/call-singlem/tmp.8249a9e0/_raw_extracted_reads.fa3bu162jc\' returned non-zero exit status 101.\nSTDERR was: b"[2022-07-18T21:56:43Z INFO mfqe] Read in 1997 read names from /dev/stdin\n[2022-07-18T21:56:43Z INFO mfqe] Iterating input FASTQ file\nthread \'main\' panicked at \'called
Result::unwrap()
on anErr
value: UnexpectedEnd { line: 7183 }\', src/main.rs:316:25\nnote: run withRUST_BACKTRACE=1
environment variable to display a backtrace\n"STDOUT was: b\'\'\n'STDOUT was: b''