Sample QC pipeline improvements TO DOs.

hardingnj commented 5 years ago

better description of creation of genotype frequency files.
add description of how to run merge step
fix ${PWD} case in readme
consider handling of missing files/samples

alimanfoo commented 5 years ago

Hi @hardingnj, just checking in here, are these still relevant, or have they been addressed already?

hardingnj commented 5 years ago

These are yet to do. Migrated from Alex's input plus a couple of other thoughts.

amakunin commented 5 years ago

Hi @hardingnj. A few comments upon vobs-funestus integration:

suggest moving memory requirements from params.req in the Snakefile to a separate cluster.json config: syntax of memory requests is specific to cluster architecture, also memory required depends on the reference genome
in my case, I had to use $(pwd) instead of ${pwd} in the submission command
maybe add X/autosomal ratio calculation to QC summary script?

hardingnj commented 5 years ago

* suggest moving memory requirements from `params.req` in the Snakefile to a separate `cluster.json` config: syntax of memory requests is specific to cluster architecture, also memory required depends on the reference genome

Yes- good idea. I'm not sure exactly how to do this though? What were you thinking? Defining the mem requirements of each rule separately in cluster.json?

* in my case, I had to use `$(pwd)` instead of `${pwd}` in the submission command

That's a typo. Originally I had ${PWD}, alistair changed it to $(pwd). I didn't realise what he was doing, so changed it to ${pwd}. Oops!

* maybe add X/autosomal ratio calculation to QC summary script?

We deliberately left this out as we can't assume which contigs/chromosomes will be present. Brandy is building with 1000s of contigs.

amakunin commented 5 years ago

Got it!

Yes, setting memory req's per rule would make sense, as contamination estimation needs significantly more RAM than other rules

malariagen / vector-tools

Sample QC pipeline improvements TO DOs. #12