qiita-spots / qiita

Qiita - A multi-omics databasing effort
http://qiita.microbio.me
BSD 3-Clause "New" or "Revised" License
120 stars 80 forks source link

dbBact's wordclouds for Qiita #3380

Closed sjanssen2 closed 4 months ago

sjanssen2 commented 7 months ago

In my collaborations, I often encounter situations where PhD candidates are volunteered by their PI's to also handle amplicon analysis, but are total microbiome newbies. Situations might be complicated, because others might have done sample collection, sequencing was outsourced, ... Without any experience, they now have to sanity check if the sequencing was successful ... any PI's too often love to take shortcuts, like expired and flowcells, excessive multiplexing, ... Sanity checking is extremely hard if not impossible without having seen many OKish datasets. I was recently contacted by Amnon and colleagues who finally published dbBact. In a nutshell: they collect expert knowledge for individual ASV sequences. I found their wordclouds (i.e. enrichted terms of ASVs in a feature-table - or more precisely: rep set) extremely helpful to characterize a sequencing run / prep without relying on the metadata, which are sadly too often wrong or incomplete. It is quite easy to see if a prep holds samples from e.g. mouse or soil or ....

Therefore, I'd like to integrate these wordclouds into Qiita and wonder what the best strategy is? Here are my thoughts:

https://dbbact.org/sequences_fscores

to use it, just supply the json parameter 'sequences' which is a list of the sequences (ACGT string that start from one of the supported dbbact regions)

example:
seqs=['TACGGAGGGTGCAAGCGTTGTCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGCTTTTTAAGTCTGGGGTGAAAGCCCGTTGCTCAACAACGGAACTGCCCTGGAAACTGGAGAGCTTGAGTACAGACGAGGGTGGCGGAATGGACGG']
res=requests.get('https://dbbact.org/sequences_fscores',json={'sequences': seqs})
print(res.json())

I'd be happy to know your opinion @antgonza before I start implementing. Thank you!

antgonza commented 6 months ago

Hi @sjanssen2, thank you for your question - this is an interesting one!

First, would you consider this a processing or analytical tool? By processing I mean transforming (raw) sequences to feature tables or analytical something that you apply to the feature table downstream your analytical pipeline?

Based on the description provided here, I think is more analytical as it get's applied to the deblur results - mainly a feature table; what do you think? If you agree, then this plugin should be part of the analysis and not the processing as the 3 options you present show.

All analyses are done via QIIME 2 so if you would like to add it you would need to have a q2 plugin and then add it to the qiita QIIME 2 plugin, like this. In case it helps here are a couple of examples of how to add QIIME 2 plugin to the qiita QIIME 2 plugin:

If this is the route you prefer, your plugin could simply "summarize" (create a qzv) via dbBact a feature table.

Hope this helps.

antgonza commented 4 months ago

Closing for now, please reopen if you have further questions.