ohnosequences / mg7

Configurable and scalable 16S metagenomics data analysis
https://goo.gl/y3rZFD
GNU Affero General Public License v3.0
3 stars 3 forks source link

Test data pipelines #105

Closed eparejatobes closed 7 years ago

eparejatobes commented 8 years ago

This is a portmanteau issue covering

laughedelic commented 7 years ago

@marina-manrique results of the latest MG7 run on the illumina/pacbio mock data are in

s3://resources.ohnosequences.com/ohnosequences/mg7/1.0.0-M5-pr78-158-ge56bab7/test/

I'm releasing MG7 1.0.0-M5 based on the corresponding version and writing missing docs in this issue.

marina-manrique commented 7 years ago

Cool! I'm checking them today!

eparejatobes commented 7 years ago

I'm reviewing this.

eparejatobes commented 7 years ago

There are some serious issues with the output in s3://resources.ohnosequences.com/ohnosequences/mg7/1.0.0-M5-pr78-158-ge56bab7/test/pacbio/.

  1. All the direct count files have a line: ,Taxa,,,0.0000,NaN
  2. All frequencies are 0.0000 everywhere
  3. The average pident is - in some cases; is this expected? if so, why?

@laughedelic please mark 1.0.0-M5 as broken, and open issues for all of the above.

laughedelic commented 7 years ago

@eparejatobes thanks for the feedback!

  1. [x] https://github.com/ohnosequences/mg7/issues/124: fixed
  2. [x] https://github.com/ohnosequences/mg7/issues/125: fixing
  3. [x] this is expected for the taxa that has no direct assignments (i.e. no hits, no pidents)
eparejatobes commented 7 years ago

Also I see a lot of no-hits. Which params were used here? m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs from staggered for example, has

m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000095DC7A,99.59,1474,4,2,1,1495,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000002B710,99.59,1474,4,2,1,1492,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00008E4995,99.59,1474,4,2,1,1501,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00002E8C4B,99.59,1474,4,2,1,1501,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00005C9B1E,99.59,1474,4,2,1,1482,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000003687D,99.59,1474,4,2,1,1493,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000520518,99.59,1474,4,2,1,1501,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000052F24,99.59,1474,4,2,1,1473,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000586E96,99.59,1474,4,2,1,1501,0.0,2687,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00006160CB,99.53,1474,5,2,1,1473,0.0,2684,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000048ACA2,99.53,1474,5,2,1,1473,0.0,2684,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000774040,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000695978,99.53,1474,5,2,1,1500,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000832597,99.59,1471,4,2,1,1487,0.0,2682,99
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000051E9E,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00007A220D,99.53,1474,5,2,1,1475,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00007B0CD4,99.53,1474,5,2,1,1475,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000025D69C,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000826EAF,99.53,1474,5,2,1,1489,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00007F2B41,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000468437,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00000589C9,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000037144,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00007BC3DC,99.53,1474,5,2,1,1493,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000012200D,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00004EF4A8,99.53,1474,5,2,1,1494,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00001F158F,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00000C125A,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000186B7C,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00004AAA6D,99.53,1474,5,2,1,1493,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000029715,99.53,1474,5,2,1,1486,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00001BA799,99.53,1474,5,2,1,1490,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000042A81F,99.53,1474,5,2,1,1493,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00003CF2A3,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00003F90D9,99.53,1474,5,2,1,1477,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00004F66E9,99.53,1474,5,2,1,1509,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000006F338,99.53,1474,5,2,1,1473,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00007DF57E,99.53,1474,5,2,1,1492,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS0000804FB8,99.53,1474,5,2,1,1501,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000008006E,99.53,1475,4,3,1,1494,0.0,2682,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS00008DD3FF,99.53,1474,4,3,1,1500,0.0,2680,100
m150115_081355_sherri_c100725952230000001823153204301532_s1_p0/30523/ccs,gnl|ohnosequences.db.rna16s|URS000051015B,99.52,1473,5,2,1,1499,0.0,2680,100
...
eparejatobes commented 7 years ago
  1. this is expected for the taxa that has no direct assignments (i.e. no hits, no pidents)

In that case it should be the weighted average of their descendants

laughedelic commented 7 years ago

In that case it should be the weighted average of their descendants

I can't find the discussion here, but I remember that talked about it and if I remember right, decided not to implement this feature in the v1.0. I could forget or mix it up, of course.

Also I see a lot of no-hits. Which params were used here?

See the defaults code.

eparejatobes commented 7 years ago

OK fine about average identity. With respect to the no-hits issue, the only reason I can think of is word size. Next time you run this (after fixing those bugs above) use the same word size that is the global default: word_size(46).

laughedelic commented 7 years ago

fine about average identity

I opened https://github.com/ohnosequences/mg7/issues/126 not to forget to do it later.

Next time you run this (after fixing those bugs above) use the same word size that is the global default: word_size(46)

OK

eparejatobes commented 7 years ago

oh I almost forgot; what about the coverage filter? is it 100%?

laughedelic commented 7 years ago

Yes. The default filter for both Illumina and PacBio is qcovs == 100. Do you want to change it?

eparejatobes commented 7 years ago

For pacbio 99 and 98.5 identity

laughedelic commented 7 years ago

@eparejatobes Done. Review the results please:

s3://resources.ohnosequences.com/ohnosequences/mg7/1.0.0-M5-15-gfb8a06a/test/
eparejatobes commented 7 years ago

I'm working on this

laughedelic commented 7 years ago

@eparejatobes told me that it's fine. Merging this.