sanjaynagi / AmpSeeker

A snakemake workflow for amplicon sequencing
https://sanjaynagi.github.io/AmpSeeker/
0 stars 3 forks source link

MultiQC, FastQC and bcftools stats implementation #23

Closed ChabbyTMD closed 1 year ago

ChabbyTMD commented 1 year ago

Resolves #20 Added functionality for read and other metric quality control as well as MultiQC report generation.

sanjaynagi commented 1 year ago

Awesome, looks great Trevor!!

Not sure why CI is failing - a quick google says it could be to do with the python version. Could try pinning to python to 3.7 in the ampseq_cli.yaml? We should actually be doing this anyway, added an issue #24

sanjaynagi commented 1 year ago

Ahhh, didnt realise we had already set python 3.9 in the Ampseq_python.yaml. I think we should stick to 3.9 for that.

The MultiQC rule will be using the python version that's install in Ampseq_cli.yaml, however, so we need to set the python version in there. Might make sense to also attempt using python=3.9 for that aswell. Have just done so, lets see if CI works

sanjaynagi commented 1 year ago

Trying 3.7 + multiQC=1.6

sanjaynagi commented 1 year ago

Ok ive tried to create the envs locally, and then using the mamba 'solved' yaml and using that instead in the workflow. This might end up being a bit of a pain when we want to add packages, but we shouldn't need to do that too often now.

EDIT this didn't work either, dropping that idea.

sanjaynagi commented 1 year ago

I hate conda

sanjaynagi commented 1 year ago

@ChabbyTMD To get over this conda issue, I suggest we use remove multiQC from the AmpSeeker-cli.yaml conda env and use a wrapper instead https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/multiqc.html.

Snakemake wrappers are like prepackaged conda envs with the idea of improving reproducibility. There is an example at the above page.

eddUG commented 1 year ago

@sanjaynagi The idea of using wrappers shall solve many pains but also enhance reproducibility and best practices. Had shelved this for our last call but unfortunately couldn't last through it.

I propose we replace all shell calls with wrappers where applicable - adding this as an issue might be helpful

sanjaynagi commented 1 year ago

hey @eddUG :) Indeed, I've thought about doing this before, but occasionally you get wrappers which either don't work or are inflexible for what you need (i.e they don't allow some combinations of parameters). Although I agree, I suggest we just do it as and when we need to, as going through all the shell commands now and changing them will take some time.

sanjaynagi commented 1 year ago

@ChabbyTMD Let me know if youd like me to take this one :)

(and this is always an option!)

ChabbyTMD commented 1 year ago

Hi Sanjay, I can do the multiqc replacement with the wrapper. In the prototyping stage though I had issues using the fastqc wrapper. It would always error out at runtime not quite sure why. Using the shell command instead solved the issue.

sanjaynagi commented 1 year ago

Cool. Well If the wrapper doesn't work, then perhaps just use a shell command and a separate conda env multiqc.yaml :)

sanjaynagi commented 1 year ago

Just tried a few things to get the conda working - moving fastqc into its own ampseeker-qc.yaml.

FYI, where we use wrappers, there is no need to specify a conda env aswell.