Workflows - Test small data sets for workflows (mg_annotation)

microbiomedata / mg_annotation

Metagenome Annotation Workflow

4 stars 7 forks source link

Workflows - Test small data sets for workflows (mg_annotation) #5

Closed ssarrafan closed 3 years ago

ssarrafan commented 3 years ago

create a small test data set with expected outcome if workflow run is successful

link test results to this issue

hubin-keio commented 3 years ago

Shane, please point me the location of two test data sets (one smaller set for quick test and one small realistic data set for thorough test) and the expected outcome from each workflow run in JSON format. I will work on a unit-test-like solution for testing. Thank you.

scanon commented 3 years ago

I've been using these...

/global/cfs/cdirs/m3408/aim2/testing/2021-01-22/241870 /global/cfs/cdirs/m3408/aim2/testing/2021-01-22/241871

But I need to verify with Marcel if those can be made public or not.

scanon commented 3 years ago

The JSON output is not currently part of the pipeline.

mhuntemann commented 3 years ago

No, those datasets can not be made public. I just picked them, since they were some of the most recent ones processed with the latest version of the pipeline at that time. They were only meant to be used for internal comparison. If we need something that we can make public, maybe Reddy or Amy can help us identify a few different sized metagenomes that are actually published and completely free to use.

scanon commented 3 years ago

I'm using some data sets from Stegen that were already part of NMDC. So those should be fine. I'll make a PR that adds the new testing WDLs.

ssarrafan commented 3 years ago

@scanon and @hubin-keio can I move this to the May sprint or do you consider this to be completed?

ssarrafan commented 3 years ago

Moving to May sprint. Please close if this is done.