Closed gongyixiao closed 1 year ago
i was unable to find a command-line invocation of BRASS through the links in the first comment of this issue. instead i found https://dockstore.org/containers/quay.io/wtsicgp/dockstore-cgpwgs:2.1.0?tab=info and https://github.com/cancerit/dockstore-cgpwgs
BRASS is structured as a string of steps:
input
cover
merge
normcn
group
isize
filter
split
assemble
grass
tabix
they can be run one by one (using -p
to indicate which step) or altogether (no -p
). only input
, cover
, and assemble
can be run in parallel.
the dockstore-cgpwgs
implementation runs input
and cover
in parallel initially. then later on it runs the rest of the steps, omitting the -p
parameter and using an output file from ascat
. ascat
from ascatNGS
is also installed in the wtsicgp/dockstore-cgpwgs:2.1.0
image.
we might break this up into three processes, all run with wtsicgp/dockstore-cgpwgs:2.1.0
:
BRASSprep
- run input
and cover
steps simultaneously as in dockstore-cgpwgs ascat
- this may be reused by other processes we might like to incorporate, including hrdetect
. run as in the dockstore-cgpwgs implementationBRASS
- run remaining steps as in dockstore-cgpwgsthere might be some reference files which we will have to incorporate. i'm not sure if they are stored inside the docker or not.
there might be some reference files which we will have to incorporate. i'm not sure if they are stored inside the docker or not.
This might (not) be relevant: https://dockstore.org/containers/quay.io/wtsicgp/dockstore-cgpwgs:1.1.5?tab=files
Update:
bam_stats
which is on both their combined docker quay.io/wtsicgp/dockstore-cgpwgs
and a dedicated PCAP-core
docker quay.io/wtsicgp/pcap-core
. BRASS
from bam files to the feature/BRASS
branch.quay.io/wtsicgp/dockstore-cgpwgs
, version 2.1.0 because that was the latest version at the time the PCAWG paper was published. I used this docker by setting -profile pcawg_versionControl
. In another set of configurations, i tested the dedicated dockers with most current versioning:
quay.io/wtsicgp/brass:v6.3.4
quay.io/wtsicgp/ascatNgs:4.4.0
quay.io/wtsicgp/pcap-core:5.5.0
I used these dockers by setting -profile pcawg_updated
. The dedicated dockers seemed slightly faster:Process | duration pcawg_updated | duration pcawg_versionControl |
---|---|---|
runPCAP | 25m | 31m |
runPCAP | 34m | 1h 2m |
runAscat | 2h 18m | 1h 49m |
runBRASS | 7h 12m | 8h 58m |
memory
and cpus
are equal for each docker configuration, and it was run on the same sample bams. runPCAP
is run twice because it is run it on two bams in parallel. For purposes of this test I did not break up BRASS
into multiple steps, but I think this gives us an idea of the runtimes anyways.
i was able to dramatically cut down on total elapsed time for brass by parallelizing jobs for input
and cover
steps (see commit fb9a68e8c7f2c795a6c9c19ce9e3d5fc5eca3f1d), and i just ran the remaining steps in a single job (see attached timeline.html). however, i could further break up remaining steps in the runBRASS
step to get even further parallelization. At this point I'm wondering if it is overkill and how much we value a "clean"-looking DAG for presentation purposes. with runBRASSInput
and runBRASSCover
already taken care of, I could further break down runBRASS
into 7 nextflow processes:
merge
group
isize
normcn # group isize normcn can be run in parallel after merge
filter
split # filter and split can be run one after the other in the same process
assemble # can be broken up into 24 parallel processes, similar to the way `input` and `cover` were optimized
grass
tabix # grass and tabix can be run one after the other in the same process
@gongyixiao @stevekm , do you think it's worth it to further break down the steps? timeline.html.zip
sidenote: ascat is also made up of three steps and i was able to parallelize the first step, which also means improving processing time.
I think this can be left for further optimization. Let's get everything running and then do the optimization.
在 2020年12月11日,上午12:53,anoronh4 notifications@github.com 写道:
i was able to dramatically cut down on total elapsed time for brass by parallelizing jobs for input and cover steps (see commit fb9a68e), and i just ran the remaining steps in a single job (see attached timeline.html). however, i could further break up remaining steps in the runBRASS step to get even further parallelization. At this point I'm wondering if it is overkill and how much we value a "clean"-looking DAG for presentation purposes. I could break down runBRASS into 7 nextflow processes:
merge group isize normcn # group isize normcn can be run in parallel after merge filter split # filter and split can be run one after the other in the same process assemble # can be broken up into 24 parallel processes, similar to the way
input
andcover
were optimized grass tabix # grass and tabix can be run one after the other in the same process @gongyixiao @stevekm , do you think it's worth it to further break down the steps? timeline.html.zip— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
BRASS (Sanger pipeline): https://dockstore.org/containers/registry.hub.docker.com/sevenbridges/pcawg_sanger_sbg_modified/pcawg_sanger_vc_sbg_modified
Dockstore.cwl
into nextflow friendly scripts