nbokulich / short-read-tax-assignment

A repository for storing code and data related to a systematic comparison of short read taxonomy assignment tools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

mock community data generation notebook? #1

Closed nbokulich closed 7 years ago

nbokulich commented 7 years ago

@gregcaporaso @benkaehler should we add a notebook demonstrating how to generate (from raw mock community fastqs) empty biom tables and seq rep sets ready for taxonomy assignment?

My initial feeling is that this may be unnecessary, given that it is such a basic process and we just supply these materials for multiple datasets ready for analysis.

However, we might want to update these materials anyway, using QIIME2 instead of QIIME1 (though this carries the risk that we may need to change some functions in the existing notebooks).

Thoughts?

gregcaporaso commented 7 years ago

I think it would be a good idea to update these to use QIIME 2 - one benefit of doing that is that we wouldn't really need to provide notebooks (though a markdown file would be nice), we could just provide links that would allow viewing of artifact provenance. What would you guys need to make this happen?

nbokulich commented 7 years ago

sounds good — though until we have a provenance2pipeline converter, a notebook or markdown file would make it easier to replicate this pipeline on other mock communities.

I will just start out by making a notebook to organize my own thoughts — and then we can delete or convert to markdown later if we deem it unnecessary.

I should be able to figure out a pipeline for this with Artifact API, though if you or someone else already has a notebook or list of artifact API commands for processing fastq-->feature-table (I've been doing this all CLI), that will speed things up.

gregcaporaso commented 7 years ago

Great - doing this with notebooks and the artifact API would actually be pretty valuable (for testing, and potentially to port to QIIME 2 documentation) so I think it's a good idea. I don't have those notebooks, though @johnchase might have some that he can share with you. If not, you're just translating CLI to artifact API calls, so once you figure it out for one command it'll translate easily to the others.

nbokulich commented 7 years ago

What file formats should we deposit in the repo? We currently have qiime1 biom tables for each mock community. Should we convert qiime2 artifacts that I generate into biom tables? That way we support more general use, don't require having qiime2 installed to use.

gregcaporaso commented 7 years ago

What if we did both the biom file exported from the qza, and the qza itself? That way people who do want to use with QIIME 2 don't lose the provenance of the artifacts, but it's still easy for non-QIIME 2 users.

On Wed, Nov 30, 2016 at 2:23 PM, Nicholas Bokulich <notifications@github.com

wrote:

What file formats should we deposit in the repo? We currently have qiime1 biom tables for each mock community. Should we convert qiime2 artifacts that I generate into biom tables? That way we support more general use, don't require having qiime2 installed to use.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nbokulich/short-read-tax-assignment/issues/1#issuecomment-263999677, or mute the thread https://github.com/notifications/unsubscribe-auth/AALvdNI4lS7_GTFFmh5gjefeHHmg5JF1ks5rDeljgaJpZM4K7OFe .

nbokulich commented 7 years ago

Sounds like a plan. I can do the same with the sample metadata and tree files. Thanks!

gregcaporaso commented 7 years ago

Thanks! Note that the sample metadata should be the same file for both QIIME 1 and QIIME 2.

nbokulich commented 7 years ago

@gregcaporaso notebook is done! Notebook here and relevant code here.

This should be very easily adaptable to use on any files, though it is currently set up for batch file processing.

All mock communities are now processed, and new data is located here. (Note that old mock community dirs are still there, I will delete once I get taxonomy assignments for the new data)

Let me know what you think! I hope it's not too ugly...

gregcaporaso commented 7 years ago

Looks good from a quick glance @nbokulich, would you like me to do a detailed review of this now?

nbokulich commented 7 years ago

If you have the time, but if not this is not holding me up.

gregcaporaso commented 7 years ago

Ok, I'll plan to do it as we progress a little more with the analysis of the new classifiers then as I'm going to be pretty tied up with travel this week.

nbokulich commented 7 years ago

done