Collaboration with OSMS for added functionality

microsud commented 3 years ago

Added functionality where users can analyse zymo standard mocks for taxonomic assignment accuracy(?) Ref: https://github.com/OxfordCMS/OCMS_zymoBIOMICS Development branch: https://github.com/microsud/chkMocks/tree/OCMS_zymoBIOMICS

cc: @nickilott @sfeds

nickilott commented 3 years ago

Hey! I have been chatting with @schyen re adding functionality that we think might be useful. As discussed in the meeting we think it would be good to have a systme where people can compare their results (correlation with expected) to a database of correlations (contributed by the community and us) to see where they fall after running chkMocks. We figure that the output could be something simple to start with like in the image - histogram of correlations. Would be great though to provide functionality for people to submit their own correlations to a central database that can be queried when people are runnong chkMocks. Would be great to hear what you think

microsud commented 3 years ago

This is a good functionality to add. I think we should start with creating aphyloseq object of all samples in a database (Db), let's say mockDbPseq where we can add the counts and several related info in sample_data:

Laboratory Origin
DNA extraction method/Kit
Seq'ng Chemistry, length seq'ed
Primer Seqs, Region and References
DADA2 related settings like filterTrim settings.

For users to use this in the way we intend them to use it:

A standard workflow in RMarkdown with settings used for Db creation to get from raw reads to mockDbPseq object that chkMocks can use.

As a next step we can think of:

In case users want to process 'our' MCs through the settings they prefer for their project, we then need to figure out how we can provide the raw fastq files of MCs we used for Db creation.

For viz: I definitely see the usefulness of the plot you showed where user straight away sees what they have. Let me know if you think I missed some information.

Cheers, Sudarshan

schyen commented 3 years ago

I guess it's a balance between capturing metadata and gathering community contribution of their standards results to gain a wider picture of how mock communities perform. Personally, when I have to add in more metadata, I'm less like likely to make a submission/contribution.

I think collecting a sample's correlation of expected vs. observed is the bare minimum of what I would want to analyse (in terms of looking how did my standard compare to others'). Setting up a database that allows for community contribution and updates on the fly is an interesting problem and I'd be happy to spend some time looking for an effective way to gathering, storing/hosting this data (as a phyloseq object or otherwise).

I think once we have the data storage figured out, it would be easiest to start comparing and adding data to it via chkMocks. I'm envisioning a pair of functions such as compare_to_db() and submit_to_db within chkMocks. That way, any information that would be used in analysis with chkMocks can be transferred to the submit function without the use having to fill out additional fields to perform a submission. For example, observed vs expected relative abundance of each bug within your sample would be an analysis done within ChkMocks, so it wouldn't be a huge leap to transfer that relative abundance data into the submission and add to the database. From there, I would prioritise which types of metadata would be most pertinent to include (i.e. 16S region sequenced is likely to influence standards results than sequencing chemistry) and finding balance of how much metadata to ask for.

one example flow of data would be

my_standards <- chkMocks::perform_correlation()
compare_to_db(my_standards)
submit_to_db(my_standards)

I'd love to hear your thoughts on this!

Sandi

microsud commented 3 years ago

@schyen These are good points to consider!

I think collecting a sample's correlation of expected vs. observed is the bare minimum of what I would want to analyse (in terms of looking how did my standard compare to others'). Setting up a database that allows for community contribution and updates on the fly is an interesting problem and I'd be happy to spend some time looking for an effective way to gathering, storing/hosting this data (as a phyloseq object or otherwise).

Agree. For now, we should keep it simple and intuitive.

I think once we have the data storage figured out, it would be easiest to start comparing and adding data to it via chkMocks. I'm envisioning a pair of functions such as compare_to_db() and submit_to_db within chkMocks. That way, any information that would be used in analysis with chkMocks can be transferred to the submit function without the use having to fill out additional fields to perform a submission. For example, observed vs expected relative abundance of each bug within your sample would be an analysis done within ChkMocks, so it wouldn't be a huge leap to transfer that relative abundance data into the submission and add to the database. From there, I would prioritise which types of metadata would be most pertinent to include (i.e. 16S region sequenced is likely to influence standards results than sequencing chemistry) and finding balance of how much metadata to ask for.

So to begin with we can focus on what we have and think about community contributions later...

my_standards <- chkMocks::perform_correlation() compare_to_db(my_standards) submit_to_db(my_standards)

I agree with the approach you suggest. Just curious: Have you tested the chkMocks on mock communities from your lab? Looking for Suggestions... Would it be possible to add those as the first samples in the database? Whenever you are ready, you can create a PR (one function at a time 😄) and we can discuss the implementation. You can check the pkg docs and codes for the naming convention and documentation style for roxygenize() that is currently used.

schyen commented 3 years ago

I've cloned chkMocks and plan on adding functions based off the OCMS zymobiomics report. I'll check out the documentation to try to follow your code management conventions.

Yes I'm planning on running our positive controls on chkMocks. It'll be good for me to reproduce your samples' analysis on our local system too :)

microsud / chkMocks

Collaboration with OSMS for added functionality #1