Review test input data at S3

ohnosequences / mg7

Configurable and scalable 16S metagenomics data analysis

https://goo.gl/y3rZFD

GNU Affero General Public License v3.0

3 stars 3 forks source link

Review test input data at S3 #106

Closed eparejatobes closed 7 years ago

eparejatobes commented 8 years ago

After #104, we need to check that everything is where it should.

marina-manrique commented 8 years ago

Illumina raw datasets to run the tests are here s3://era7p/mg7-test/data/in/

and the datasets for PacBio testa are here s3://era7p/pacbio/data/in/, files ending with 16S.fastq.gz

eparejatobes commented 8 years ago

OK good; we just need to put them where the code says they are.

marina-manrique commented 8 years ago

@eparejatobes where exactly? I found this "resources.ohnosequences.com", mg7.organization)/mg7.artifact/ but don't know the exact values for mg7.organization and mg7.artifact

laughedelic commented 8 years ago

@marina-manrique

organization is ohnosequences
artifact is mg7
version is whatever sbt version tells you

But I think what @eparejatobes meant is to synchronize locations (code vs. wherever we want to keep this data), not just put data where the code says (because currently code refers to some strange old locations).

laughedelic commented 8 years ago

You can check current actual locations for input data in sbt with this snippet:

> ohnosequences.test.mg7.BeiMockPipeline.inputSamples.foreach { case (sampleID, (l, r)) => println(s"${sampleID}\n  ${l.resource}\n  ${r.resource}") }

Here's example output:

ERR1049996
  s3://era7p/mg7-test/data/out/reads-preprocessing/ERR1049996_1_val_1.fq.gz
  s3://era7p/mg7-test/data/out/reads-preprocessing/ERR1049996_2_val_2.fq.gz

Same for other pipelines.

marina-manrique commented 8 years ago

Ok, so I'm checking this with @eparejatobes later, thanks! @laughedelic

marina-manrique commented 8 years ago

I've put the input data here

Illumina datasets s3://resources.ohnosequences.com/ohnosequences/mg7/mock-communities-data/illumina/
Pacbio datasets s3://resources.ohnosequences.com/ohnosequences/mg7/mock-communities-data/pacbio/

@eparejatobes should I change something in the code?

eparejatobes commented 8 years ago

@marina-manrique update/fix these two pipeline definitions here:

marina-manrique commented 8 years ago

@rtobes @eparejatobes I've just realised that, for Illumina, we tested the tool with preprocessed reads. So far we have only the preprocessed data of the sample ERR1049996. Do you want me to do the same preprocessing with the rest of the illumina samples so we can test MG7 with all the illumina datasets?

laughedelic commented 8 years ago

This needs to be updated to the current master (after major changes in #112). I'll do it later.