ohnosequences / mg7

Configurable and scalable 16S metagenomics data analysis
https://goo.gl/y3rZFD
GNU Affero General Public License v3.0
3 stars 3 forks source link

Review test input data at S3 #106

Closed eparejatobes closed 7 years ago

eparejatobes commented 8 years ago

After #104, we need to check that everything is where it should.

marina-manrique commented 8 years ago

Illumina raw datasets to run the tests are here s3://era7p/mg7-test/data/in/

and the datasets for PacBio testa are here s3://era7p/pacbio/data/in/, files ending with 16S.fastq.gz

eparejatobes commented 8 years ago

OK good; we just need to put them where the code says they are.

marina-manrique commented 8 years ago

@eparejatobes where exactly? I found this "resources.ohnosequences.com", mg7.organization)/mg7.artifact/ but don't know the exact values for mg7.organization and mg7.artifact

laughedelic commented 8 years ago

@marina-manrique

But I think what @eparejatobes meant is to synchronize locations (code vs. wherever we want to keep this data), not just put data where the code says (because currently code refers to some strange old locations).

laughedelic commented 8 years ago

You can check current actual locations for input data in sbt with this snippet:

> ohnosequences.test.mg7.BeiMockPipeline.inputSamples.foreach { case (sampleID, (l, r)) => println(s"${sampleID}\n  ${l.resource}\n  ${r.resource}") }

Here's example output:

ERR1049996
  s3://era7p/mg7-test/data/out/reads-preprocessing/ERR1049996_1_val_1.fq.gz
  s3://era7p/mg7-test/data/out/reads-preprocessing/ERR1049996_2_val_2.fq.gz

Same for other pipelines.

marina-manrique commented 8 years ago

Ok, so I'm checking this with @eparejatobes later, thanks! @laughedelic

marina-manrique commented 8 years ago

I've put the input data here

@eparejatobes should I change something in the code?

eparejatobes commented 8 years ago

@marina-manrique update/fix these two pipeline definitions here:

marina-manrique commented 8 years ago

@rtobes @eparejatobes I've just realised that, for Illumina, we tested the tool with preprocessed reads. So far we have only the preprocessed data of the sample ERR1049996. Do you want me to do the same preprocessing with the rest of the illumina samples so we can test MG7 with all the illumina datasets?

laughedelic commented 8 years ago

This needs to be updated to the current master (after major changes in #112). I'll do it later.