tanaes / snakemake_shotqual

Snakemake rules for parallel QC of shotgun data
1 stars 3 forks source link

More elegant handling of low-quality samples #5

Open tanaes opened 7 years ago

tanaes commented 7 years ago

Currently, samples with very few reads cause issues in certain steps of the pipeline.

Examples:

  1. Humann2; low read count samples cause error on creation of biom file, exits pipeline
  2. Mash; very low read count samples have insufficient sequence length to generate N kmers, fails on mash rule
  3. DM calculation, for Mash or Humann2: rarefaction problem; sample depth obscures other trends.

Currently I've been handling by redoing computation with a config file where low-read samples are commented out. Would be preferable to do this algorithmically in script.