CUT_n_TAG

Pre-/post-processing pipelines and results for CUT&TAG data generated by the Neurogenomics Lab @ Imperial College London.

Results

Links to all results. Each subheader is the unique ID of a given sequencing batch, assigned by the Imperial BRC Genomics Facility.

EpiCompare

Native ChIP-seq (GSE66023) vs. ENCODE ChIP-seq (ENCSR000AKP-ENCFF038DDS)

CUT&Tag/CUT&Run/TIP-seq vs. ENCODE

Brian's reports

Sera's reports

Reference=ENCODE_H3K27ac: Comparison of CUT&Tag, CUT&Run and TIP-seq data generated by the Imperial Neurogenomics Lab vs. ENCODE.

HK5M2BBXY

Description: Initial test run of four samples (two H3k27ac + two H3k27ame3). Accidentally merged libraries across assay types (H3k27ac/H3k27ame3) during nf-core/atacseq run (will fix).

MultiQC report

ataqv report

NF execution report

NF execution timeline

All samples

Bulk analysis

Single cell H3K27ac

Downloading new samples

When BRC sends you an email letting you know they've finished sequencing your samples, follow these steps to download and prepare the data.

Note: File and folder names are just used as examples here. You'll need to adapt these to match your particular file/folder names.

Log onto HPC.
If you haven't done so already, set up your irods credentials (instructions here). You only need to do this once.
Move into the folder where you want to store your data.

Download the data with irods:

module load irods/4.2.0
iget -Pr /igfZone/home/di.hu/IGFQ001187_hu_10-5-2021_scCutandTag/fastq/2021-05-11/HL25VBBXY
cd HL25VBBXY

Unpack each .tar file:

tar -xvf IGFQ001187_hu_10-5-2021_scCutandTag_4_16_2021-05-11.tar 
tar -xvf IGFQ001187_hu_10-5-2021_scCutandTag_6_16_2021-05-11.tar

Remove the old files (once you're sure the previous step worked):

rm IGFQ001187_hu_10-5-2021_scCutandTag_4_16_2021-05-11.tar
rm IGFQ001187_hu_10-5-2021_scCutandTag_6_16_2021-05-11.tar

Optional: Change permissions recursively so that other members of your team can access and manipulate the files. Make sure to adapt the scope of the permissions however is appropriate for your case.
```
chmod -R u=rwx,go=rx ../HL25VBBXY/
```

Pipelines

nf-core/atacseq

Platform: nf-core (nextflow + singularity/docker)
Discussion on adapting this pipeline for CUT&RUN data.

1. Setup containers on HPC

Docker isn't allowed on HPC by itself because it presents some security risk. Instead, follow these instructions to create a R-based Docker container (Rocker) inside a singularity container.
By default singularity bind mounts](https://singularity.lbl.gov/quickstart) /home/$USER, /tmp, and $PWD into your container at runtime.

mkdir -p /rds/general/user/$USER/ephemeral/tmp/  
mkdir -p /rds/general/user/bms20/ephemeral/rtmp/

On HPC, Rocker containers can be run through Singularity with a single command much like the native Docker commands, e.g. "singularity exec docker://rocker/tidyverse:latest R"
By default singularity bind mounts](https://singularity.lbl.gov/quickstart) /home/$USER, /tmp, and $PWD into your container at runtime.
!IMPORTANT! You may need to change the path of "/rds/general//user/$USER/home/R/x86_64-redhat-linux-gnu-library/3.6/" to the actualy location of your R library.

Run Rocker within singularity

singularity exec -B /rds/general/user/$USER/ephemeral/tmp/:/tmp,/rds/general/user/$USER/ephemeral/tmp/:/var/tmp,/rds/general/user/$USER/ephemeral/rtmp/:/rds/general/user/$USER/home/R/x86_64-redhat-linux-gnu-library/3.6/ --writable-tmpfs docker://rocker/tidyverse:latest R

2. Download nf-core/atacseq container

Now you can download the nfcore/atacseq singularity container via DockerHub

This will download "atacseq_latest.sif" to your home directory. singularity pull docker://nfcore/atacseq:latest
Copy this .sif file to the cacheDir specified in your nextflow config file.
scp ~/atacseq_latest.sif /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache/
Once you have the container downloaded, you can now specify it in the [-profile](download the singularity image outside of the pipeline and save in the same dir as the cacheDir path for the singularity option in the custom config file) flag in the main pipeline (see below).
More info on this process is on the lab Wiki.

3. Prepare nextflow config file

The config file tells nextflow how to run on Imperial's HPC.

module load nextflow
Copy the config file to the expected location so HPC knows how to run nextflow properly:
scp hpc_config $HOME/.nextflow/config

4. Optional: Register Nexflow Tower

Register with nextflow-tower according to Combiz's instructions
to get real-time reports as the pipeline runs. Once registered, add the token to your config file.
Run the nextflow pipeline. See here for all parameter options.

5. Download the singularity container

In theory, nf-core/atacseq should download the singularity automatically when it runs.
However in practice, downloading it this way either takes waaayyy too long, and/or fails entirely.
Therefore, per Narun Fancy's recommendation "download the singularity image outside of the pipeline and save in the same dir as the cacheDir path for the singularity option in the custom config file". /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache
For more info on the -profile flag, see here.

6. Finally run the pipeline!

--input: Path to design file.
--genome: Genome build your fasrq files are in.
-profile: Path to container profile.

nextflow run nf-core/atacseq --input raw_data/HK5M2BBXY/design.csv --genome GRCh37 -r 1.2.1 -profile /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache/atacseq_latest.sif

henipipe

Platform: python

CUTTAG_tutorial

Platform: workflowr (R + CLI)

CUT&RUNTools

Platform: CLI-

Documentation

Exercepts from the full BRC Genome help page

File name

Illumina uses the following file name convention for the output fastq files

For example: samplename_S1_L001_R1_001.fastq.gz

samplename : Name of the sample provided in the samplesheet
S1 : Number of sample based on the sample order on the samplesheet
L001 : Lane number of the flowcell
R1 : The read. For e.g. R1 indicates Read 1 and R2 indicates Read 2 of a paired-end run
001 : Its always 001
.fastq.gz : File extension. Its a gzipped fastq file

Please check the Illumina BCL2Fastq documentation for more information.

neurogenomics / CUT_n_TAG

readme