kapsakcj / nanoporeWorkflow

:dna: Shell scripts for working with bacterial isolate Nanopore sequence data on CDC servers
MIT License
9 stars 3 forks source link

add module for NanoPlot #17

Closed kapsakcj closed 4 years ago

kapsakcj commented 4 years ago

Add NanoPlot to the gpu-basecalling workflow to run after basecalling finishes below this line: https://github.com/lskatz/nanoporeWorkflow/blob/228abfd6a790c67a5d4a468fafca64b8896876fe/workflows/run_01_basecall-w-gpu.sh#L122

# np_basecall-w-gpu.sh sets the job ID to always be set as 'guppy-gpu'
# therefore, hold this job on that jobID before running NanoPlot
qsub -hold_jid guppy-gpu -o log/nanoplot.log -j y ${thisDir}/../scripts/np_runsummary_nanoplot.sh

Need to make a new script, np_runsummary_nanoplot.sh which includes something like this:

# standard command
NanoPlot --summary sequencing_summary.txt --loglength -o nanoplot/ -t 24 --N50 -p N20169-20-003
# broken down by barcode
NanoPlot --summary sequencing_summary.txt --loglength --barcoded -o nanoplot-barcoded/ -t 24 --N50 -p N20169-20-003
kapsakcj commented 4 years ago

Might be good to experiment with this flag with NanoPlot:

--drop_outliers Drop outlier reads with extreme long length.

Either that, or drop the --loglength flag

kapsakcj commented 4 years ago

After some experimentation, my preferred nanoplot cmd and params is this:

runID=N20169-20-004
NanoPlot -t $NSLOTS --summary sequencing_summary.txt --loglength --N50 --prefix $runID -o nanoplot --maxlength 100000

--drop_outliers removes too much of the longer reads. It cut out everything above 25kb (different depending on the run, of course)

kapsakcj commented 4 years ago

Done in https://github.com/lskatz/nanoporeWorkflow/commit/28cf476c0e18dbadf5cc65ad724bb68aa1e690cb