This repository contains the source code and input data used for the data analysis and modeling for: Loss of multi-level 3D genome organization during breast cancer progression (preprint available soon).
Input data download and subsequent analyses are automated using Nextflow and Singularity/Apptainer.
Docker images are hosted on GHCR and can be found in the Packages page of this repository.
Images were generated using the build*dockerfile.yml
GHA workflows using the Dockerfiles from the containers
folder.
Nextflow workflows under workflows
were developed and tested using Nextflow v22.10.7, and should in principle work with any version supporting Nextflow DSL2.
Each workflow is paired with a config file (see configs
folder). As an example, workflows/fetch_data.nf
is paired with config configs/fetch_data.config
.
Please make sure Nextflow is properly installed and configured before running any of the workflows.
The following workflows should be executed first, as they download and prepare files required by other workflows.
run_fetch_data.nf
run_preprocessing.nf
The fetch_data.nf
workflow requires internet access and can fail for various reason (e.g. connection reset by peer, service temporarily unavailable etc.). In case the workflow fails, wait few minutes, then relaunch the workflow.
The execution order of the rest of the worklows varies depending on which parts of the data analysis you are interested in re-running. The following order assumes you want to re-run the entire analysis. If you only want to re-run some steps, feel free to get in touch with us to know which steps you have to run.
run_nfcore_hic.sh
run_compress_nfcore_hic_output.sh
run_nfcore_rnaseq.sh
run_nfcore_chipseq.sh
run_diff_expression_analysis.sh
run_comparative_analysis_hic.sh
run_detect_structural_variants.sh
run_compartment_analysis.sh
run_tad_analysis.sh
run_call_tad_cliques_workflow.sh
run_chrom3d_workflow.sh
run_comparative_analysis.sh
run_fish.sh
Inside the configs
folder there are the following base configs:
base_hovig.config
base_linux.config
base_saga.config
base_macos.config
These configs are passed to all workflows, and define available computation resources. You will most likely have to update one of the configs with resources available on our machine/cluster.