Closed piwling closed 5 years ago
HI,
Thanks for your interested and detailed questions.
For cluster, can I change the cluster standards, such as similarity 0.95?
Yes. Use --cluster_id
. In case you didn't know, there's a full list of options when you run singlem summarise --full_help
.
For rarefied, can the input_otu_tables be the clustered.otu_table.csv?
No, that file is the definition of the clusters. The actual result of the clustering is output on --output_otu_table
.
And what is the principle to filter the 100 number sequence?
This is a rarefaction - it is a way of dealing with different size sequencing depths in relative abundances. This is a pretty standard thing in ecology, and it's no different here.
Does that answer all your questions? Thanks.
Hi, Sorry for getting back to you so late. Of course, you have answered my questions and i will try it according to your opinion. Thank you very much.
Hello! I think I have a similar question:
I would like to merge the output files from singlem summarise --input_otu_tables otu_table.csv other_samples.otu_table.csv --biom_prefix myprefix
to calculate diversity metrics as in Woodcroft, B.J., Singleton, C.M., Boyd, J.A. et al. Nature 560, 49–54 (2018), and to do this, I am trying to get a weighted average of any taxa identified by more than one marker gene. Is the flag --cluster_id 0.95
the way to do this? Thanks! Laura
Hi Laura,
There's no easy way to combine sequences across marker genes (except via taxonomy, which isn't what you want). I'd instead suggest calculating the diversity metric for each marker gene, and then taking the mean of those results for each sample.
HTH, ben
Hi Ben Thanks for the advice - I hadn't thought of that! Laura
Hi again Ben, A follow up Q: I am trying to put together an OTU table (with taxonomy) that is a combination of the biom tables generated for each singlecopy marker gene. I have clustered the OTU table to 0.95 (which I am assuming represents species level clustering), but I am concerned that there are taxonomic repeats between marker genes, and I am unsure how to deal with them. Do you have any advice? Thanks Laura
Hi Laura,
Unfortunately I'm not sure I have a good answer. Multiple OTUs can have the same taxonomy, because taxonomy isn't necessarily down to the species level, for instance. It also isn't clear how you intend to combine the results from the different genes, since the taxonomy for the reads from a single species might be different across the different genes e.g. one might be at species level and another only genus level.
In the dev branch, @rossenzhao implemented a "condense" mode which does provide a single taxonomy table calculated from all of the genes. To do this you'd have to rerun singlem pipe on all your samples again, but luckily the dev pipe is like 95% faster. You'd also get the advantage of using GTDB r202 taxonomy, which is much better.
Let me know if you wanted to go that route and I can provide some further details. ben
Hi Ben Sure, I'd give the condense mode a try! I mainly want to run this data through lefse to pick out differentially enriched OTUs, and I was told that I would need to get the average counts of repeated taxa for that..... Thanks and sorry if this was confusing! Laura
Sure, OK, well check out the dev branch of this repo, and then use this "metapackage" with pipe https://zenodo.org/record/6469357
Then on the output of pipe run condense. HTH. ben
Hi Ben, I'm a bit confused about singlem.