malariagen / vector-tools

Scripts, pipelines, docs to support data production work for MalariaGEN vector projects.
MIT License
1 stars 0 forks source link

Sample QC pipeline improvements TO DOs. #12

Open hardingnj opened 5 years ago

hardingnj commented 5 years ago
alimanfoo commented 5 years ago

Hi @hardingnj, just checking in here, are these still relevant, or have they been addressed already?

hardingnj commented 5 years ago

These are yet to do. Migrated from Alex's input plus a couple of other thoughts.

amakunin commented 5 years ago

Hi @hardingnj. A few comments upon vobs-funestus integration:

hardingnj commented 5 years ago
* suggest moving memory requirements from `params.req` in the Snakefile to a separate `cluster.json` config: syntax of memory requests is specific to cluster architecture, also memory required depends on the reference genome

Yes- good idea. I'm not sure exactly how to do this though? What were you thinking? Defining the mem requirements of each rule separately in cluster.json?

* in my case, I had to use `$(pwd)` instead of `${pwd}` in the submission command

That's a typo. Originally I had ${PWD}, alistair changed it to $(pwd). I didn't realise what he was doing, so changed it to ${pwd}. Oops!

* maybe add X/autosomal ratio calculation to QC summary script?

We deliberately left this out as we can't assume which contigs/chromosomes will be present. Brandy is building with 1000s of contigs.

amakunin commented 5 years ago

Got it!

Yes, setting memory req's per rule would make sense, as contamination estimation needs significantly more RAM than other rules