kapsakcj / nanoporeWorkflow

:dna: Shell scripts for working with bacterial isolate Nanopore sequence data on CDC servers
MIT License
9 stars 3 forks source link

merge Curtis dev branch #1

Closed kapsakcj closed 5 years ago

kapsakcj commented 5 years ago

This PR adds:

USAGE (same as the first):

run_01_basecall-w-gpu.sh output-directory/ fast5-directory/

These scripts are written to purposefully take advantage of the Tesla V100 GPU available on node 98, since this kind of GPU is not available through qsub/UGE just yet. It needs to be run while manually logged into node98, do not run via qsub. This also takes advantage of the high accuracy basecalling model in Guppy v3.0.3, which takes a ~8X longer than the fast model according to ONT, but will give us slightly more accurate reads.

This also means that I will need to create a separate workflows/workflow.sh script called workflows/workflow-without-basecalling.sh or something, so that the basecalling is performed first, followed by the rest of the pipeline.

Intermediate files are written to $tmpdir which is specified to be created within /tmp/pjx8 since the I/O is faster there than in the GWA.

I added in a bunch of checks so that if using the same output-directory when running the script, it will check to see if files exist before running guppy_basecaller, qcat, linking sequencing_summary.txt, etc.

kapsakcj commented 5 years ago

Ah...do not merge just yet plz.... Need to fix a couple of things

kapsakcj commented 5 years ago

@lskatz now it's ready for merging 😁