Closed kassammo closed 5 years ago
Dear Mohammed,
Currently we only report taxonomic ranks that are included as such in the NCBI taxonomy. Those would be superkingdom, phylum, class, family, genus and species. The main reason is that there can be an arbitrary number of "no_rank" entries between two ranked entries (e.g. corresponding to subclasses, superfamilies). These secondary ranks are very hard to work with when designing a Last Common Ancestor algorithm, because they can be or not be present in a given taxon.
To get the ORFs with hits to Rothia mucilaginosa DY-18 you can do the following with a bit of script fu:
Hello,
Thanks for the answer. I am not looking especially only at Rothia mucilaginosa DY-18. It was for me to compare the results i've got from kraken to see how similar results they are.
Meaning that there is not possibility to go beyond species.
Second question how difficult it is if I want to use a personal database for the taxonomy ?
Thanks
Our recent preprint actually compares the results of several taxonomic and functional annotation pipelines, including the one we use for SqueezeMeta and kraken (https://www.biorxiv.org/content/biorxiv/early/2019/01/16/522292.full.pdf), although not down to the strain level.
Regarding the personal taxonomy database, it's not impossible, but it's quite hard. And in particular, the taxonomic ranks used for classification are currently hardcoded (so we can use the per-rank identity thresholds discussed in https://academic.oup.com/nar/article/42/8/e73/1076763). So it would be a bit of a large hack to get our taxonomy to the strain level.
Once again, I wouldn't personally recommend doing unsupervised homology-based strain-level taxonomy in metagenomes, unless you get really good results on complex mock communities first. We're currently testing DESMAN (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5607848/), which resolves strains by looking at variants in core genes. So far we're getting good results, and we might consider to include in the SqueezeMeta pipeline in the future.
Hello thank you for all the information. I would be really happy to test a beta version of SqueezeMeta in case. I really like SqueezeMeta especially for functional analysis and binning. One thing is still difficult is the taxonomy assignment because this is the first we are looking after the first assembly. Thanks
A quick question, if I want to try DESMAN , which file I should give for the configs .
Thanks
Mohamed
We're not testing DESMAN with snakemake, so we can't provide you with a config.json file at the moment. We are currently writing and testing some scripts to generate DESMAN input files from the SqueezeMeta data. We roughly follow the tutorial at https://github.com/chrisquince/DESMAN/tree/master/complete_example, starting from the ExtractCountFreqGenes.py
script. All the input data can be obtained by parsing SqueezeMeta results, SAM files can be obtained in the <project_name>/data
directory, you can convert them into sorted bamfiles with samtools.
Ok I see.
I will try to instakk the tool. I will let you know if I need some inputs. Do you think thatvi can use coassembly results for the SNP analysis
Thanks
It's worth trying, although we ourselves are still familiarizing with DESMAN. I'd say that if you have good mapping percentages in your *mappingstat file, there are good chances for it to work.
Hello,
I have tried the co-assembly method. I was looking into the classification details on the files 10. and 06., I noticed that I got the species but not the strain information in my classification. Is-it normal ? and how i can modify to get more information about the strain ?
For instance: I am expecting to get Rothia mucilaginosa DY-18 in my samples. I am getting only the information superkingdom:Bacteria;no rank:Terrabacteria group;phylum:Actinobacteria;class:Actinobacteria;order:Micrococcales;family:Micrococcaceae;genus:Rothia;species:Rothia mucilaginosa
I have checked on the alltaxlist.txt file from the tool and the strain is there: 680646 Rothia mucilaginosa DY-18 no rank
Thanks,
Mohamed