ohnosequences / mg7

Configurable and scalable 16S metagenomics data analysis
https://goo.gl/y3rZFD
GNU Affero General Public License v3.0
3 stars 3 forks source link

Fasta input for pipeline without Flash #24

Closed laughedelic closed 8 years ago

laughedelic commented 8 years ago

22 doesn't solve the issue:

We want to pass fasta files as input. If we don't have #21, then the pipeline is as follows:

  1. split (which currently splits files on chunks of 2/4 rows, which doesn't work with Fasta)
  2. blast (which takes each chunk, reads first 2 rows of it and makes out of it a fasta file for Blast)
  3. merge
  4. assign
  5. count
laughedelic commented 8 years ago

this being used in split instead of .grouped(...) could help

eparejatobes commented 8 years ago

I just released fastarious 0.4.0, you can just use this https://github.com/ohnosequences/fastarious/blob/v0.4.0/src/test/scala/FastaTests.scala#L92-L105 and take chunkSize from that iterator. I would do it myself, but I'm a bit lost with the configuration

laughedelic commented 8 years ago

Ok

laughedelic commented 8 years ago

This is the last feature I add to M2, it's already way too big. Running tests in #25 and releasing.

laughedelic commented 8 years ago

LGTM I've merged #26 fixes in. Now I'm merging this and going to test it tomorrow.

Approved with PullApprove