Closed ewallace closed 2 years ago
Protocol was referenced extensively in the paper, linkers from Selective Ribosome Profiling to study interactions of translating ribosomes in yeast:
- Linker 3-L1 with 5′ adenylation and 3′ dideoxy-Cytidine, unique molecular identifiers (’NN…’) (IDT, RNase-free HPLC purification): 5′-/5rApp/NNNNNATCGTAGATCGGAAGAGCACACGTCTGAA/3ddC/-3′
- Linker reverse transcription L(rt) with 5′ phosphorylated, unique molecular identifiers (’NN…’) (IDT, RNase-free HPLC purification): 5′-/5Phos/NNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG/iSp18/GTGACTGGAGTTCAGACGTGTGCTC-3′
That means:
^(?P<umi_1>.{2}).+(?P<umi_2>.{5})$
Data were shared by Gunter Kramer, they are on lab datastore at wallace_rna/bigdata/2021/Bfr1_ribosomeprofiling_Kramer. We will need to copy over to Eddie, collate the files and remove the spaces from filenames.
First we logged in to an Eddie staging node to download the data.
$ fastq_dir=/exports/csce/datastore/biology/groups/wallace_rna/bigdata/2021/Bfr1_ribosomeprofiling_Kramer
$ ls -l ${fastq_dir}
total 4984193
-rw------- 1 ewallac2 Domain Users 165 May 11 11:05 ~$Bfr data_Bfr1_RP.xlsx
-rw------- 1 ewallac2 Domain Users 8648 Jun 24 2021 Bfr_data_Bfr1_RP.xlsx
-rw------- 1 ewallac2 Domain Users 761579861 Jun 24 2021 Bfr_data_DeltaBfrRep1.fastq.gz
-rw------- 1 ewallac2 Domain Users 388571886 Jun 24 2021 Bfr_data_DeltaBfrRep2_1.fastq.gz
-rw------- 1 ewallac2 Domain Users 461166456 Jun 24 2021 Bfr_data_DeltaBfrRep2_2fastq.gz
-rw------- 1 ewallac2 Domain Users 554229827 Jun 25 2021 Bfr_data_DeltaBfrRep2_3.fastq.gz
-rw------- 1 ewallac2 Domain Users 600514575 Jun 25 2021 Bfr_data_DeltaBfrRep2_4.fastq.gz
-rw------- 1 ewallac2 Domain Users 784814528 Jun 24 2021 Bfr_data_DeltaBfrRep2.fastq.gz
-rw------- 1 ewallac2 Domain Users 467822708 Jun 24 2021 Bfr_data_wtRep1.fastq.gz
-rw------- 1 ewallac2 Domain Users 1084257521 Jun 24 2021 Bfr_data_wtRep2.fastq.gz
-rw------- 1 ewallac2 Domain Users 449 Jun 25 2021 md5sums.txt
Note that, as Bfr_data_Bfr1_RP.xlsx
says, sample DeltaBfrRep2
"This sample was sequenced multiple times to get more reads, so you have to merge these files." For now I will NOT merge these files becuase I don't think an initial analysis needs 2.5GB or reads rather than 0.7GB. We could revisit that if more depth is needed for an analysis later.
To copy these to the fastq datafiles in the group storage space on eddie, we adapted Flic's standard abbreviation CB-Sc-Bfr1_2019 (for Castells-Ballester, S. cerevisiae, Bfr1, 2019)
$ fastq_dir_eddie=/exports/csce/eddie/biology/groups/wallace_rna/fastq-datafiles/CB-Sc-Bfr1_2019
$ mkdir ${fastq_dir_eddie}
$ cp ${fastq_dir}/*Rep1.fastq.gz ${fastq_dir_eddie}
$ cp ${fastq_dir}/*Rep2.fastq.gz ${fastq_dir_eddie}
$ ls -l ${fastq_dir_eddie}
total 3026176
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 761579861 May 11 11:29 Bfr_data_DeltaBfrRep1.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 784814528 May 11 11:30 Bfr_data_DeltaBfrRep2.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 467822708 May 11 11:29 Bfr_data_wtRep1.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 1084257521 May 11 11:30 Bfr_data_wtRep2.fastq.gz
Next, create a downsampled dataset to use for troubleshooting. Let's take the initial 100,000 (10^6) reads from wtRep1. This means we decompress, take the initial 400,000 lines of the file (because each read takes 4 lines in a fastq file, Amy), then compress again. Before that we print the first 5 reads (20 lines) to terminal for a reality check, and confirm that it is really in fastq format and the adapter sequence is present.
$ gzip -dc ${fastq_dir_eddie}/Bfr_data_wtRep1.fastq.gz | head -n 20
@NB551333:149:H7LJGBGXB:1:11101:5518:1077 1:N:0:CGTACG
CGCCCCAAAATGGTTTTAGTTCNAGATTTATTGCTTTGGATCGTAGATCGG
+
6AAAAEEEEEEEEEEEEEEEAE#EEEEEEEE6EA//EEEAEEEEEE/EEEE
@NB551333:149:H7LJGBGXB:1:11101:22122:1080 1:N:0:CGTACG
CCGTATGGAATCTAAACCATAGTTATGACGATTGCTCTTGGTAATCGTAGA
+
AAAAAAEEEEAEAEEEEAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@NB551333:149:H7LJGBGXB:1:11101:14056:1080 1:N:0:CGTACG
CGTTTTCCACGTTCTAGCATTCAAGGTCCCTTAGCATCGTAGATCGGAAGA
+
AAAAAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEAEEEEE
@NB551333:149:H7LJGBGXB:1:11101:21014:1083 1:N:0:CGTACG
AGTACGCGAAACTCAGGTGCTGCAATCTGTAGAATCGTAGATCGGAAGAGC
+
AAAAAEEEEEEE/EEE/EEEEEEEEEEAEEEEEEEE/AEEEE<EAEEEEE<
@NB551333:149:H7LJGBGXB:1:11101:11242:1083 1:N:0:CGTACG
CCATCGGGTTATGCGTGTGTTACATGAACTTAGTGGATAATCGTAGATCGG
+
AAAAAEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
$ gzip -dc ${fastq_dir_eddie}/Bfr_data_wtRep1.fastq.gz | head -n 400000 | gzip > ${fastq_dir_eddie}/Bfr_data_wtRep1_init100000.fastq.gz
ls -l ${fastq_dir_eddie}
total 3028864
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 761579861 May 11 11:29 Bfr_data_DeltaBfrRep1.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 784814528 May 11 11:30 Bfr_data_DeltaBfrRep2.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 467822708 May 11 11:29 Bfr_data_wtRep1.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 2670212 May 11 11:53 Bfr_data_wtRep1_init100000.fastq.gz
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 1084257521 May 11 11:30 Bfr_data_wtRep2.fastq.gz
Logged on freshly to an Eddie interactive session with
$ qlogin -pe interactivemem 4 -l h_vmem=4G
...
(base) [ewallac2@node1h20(eddie) ~]$
This logged us in to node1h20, where @asnewell previously had problems? If anything weird happens, we can try to login to a different interactive node.
Next setup riboviz environment and check example-datasets repository is up to date and in the right branch
$ source set-riboviz-env.sh
$ cd riboviz/example-datasets/
$ git pull
$ git checkout Castells-Bfr1-107
Branch Castells-Bfr1-107 set up to track remote branch Castells-Bfr1-107 from origin.
Switched to a new branch 'Castells-Bfr1-107'
Next we try to construct a command to run on the subsampled data. (Note: we are running on a version of CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml
at commit 9b2b997, that has full-size datasets commented out and only processes samples sub: Bfr_data_wtRep1_init100000.fastq.gz
.
First we navigate to riboviz/riboviz/
and symlink the config file.
$ cd /home/$USER/riboviz/riboviz
$ ln -s /home/$USER/riboviz/example-datasets/fungi/saccharomyces/CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml
$ nextflow run prep_riboviz.nf \
-params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml \
-work-dir /exports/eddie/scratch/${USER}/work \
-ansi-log false --validate_only
N E X T F L O W ~ version 20.04.1
Launching `prep_riboviz.nf` [backstabbing_mahavira] - revision: acd7535f8d
Validating configuration only
samples_dir: .
organisms_dir: .
data_dir: .
No such directory (dir_in): ./input
That failed because we didn't specify the input directory. Next step is to specify the correct input directories. We will first try environment variables as described in environment variables and configuration tokens
We create a RIBOVIZ_SAMPLES
directory, and then symlink the fastq file directory to ${RIBOVIZ_SAMPLES}/input
$ RIBOVIZ_SAMPLES=/exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
$ mkdir ${RIBOVIZ_SAMPLES}
$ cd ${RIBOVIZ_SAMPLES}
$ ln -s /exports/csce/eddie/biology/groups/wallace_rna/fastq-datafiles/CB-Sc-Bfr1_2019 input
$ cd /home/$USER/riboviz/riboviz
$ ls ${RIBOVIZ_SAMPLES}/input
Bfr_data_DeltaBfrRep1.fastq.gz Bfr_data_wtRep1.fastq.gz Bfr_data_wtRep2.fastq.gz
Bfr_data_DeltaBfrRep2.fastq.gz Bfr_data_wtRep1_init100000.fastq.gz
$ export RIBOVIZ_SAMPLES=/exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
$ export RIBOVIZ_ORGANISMS=/home/${USER}/riboviz/example-datasets/fungi/saccharomyces
$ export RIBOVIZ_DATA=/home/${USER}/riboviz/riboviz/data
$ nextflow run prep_riboviz.nf \
-params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml \
-work-dir /exports/eddie/scratch/${USER}/work \
-ansi-log false --validate_only
N E X T F L O W ~ version 20.04.1
Launching `prep_riboviz.nf` [exotic_colden] - revision: acd7535f8d
Validating configuration only
samples_dir: /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
organisms_dir: /home/ewallac2/riboviz/example-datasets/fungi/saccharomyces
data_dir: /home/ewallac2/riboviz/riboviz/data
Validated configuration
Victory! We can find all the files. What will happen when we try to run the actual dataset? Same command as above but without --validate-only
, same export variables.
$ nextflow run prep_riboviz.nf \
-params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml \
-work-dir /exports/eddie/scratch/${USER}/work \
-ansi-log false
N E X T F L O W ~ version 20.04.1
Launching `prep_riboviz.nf` [adoring_stone] - revision: acd7535f8d
samples_dir: /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
organisms_dir: /home/ewallac2/riboviz/example-datasets/fungi/saccharomyces
data_dir: /home/ewallac2/riboviz/riboviz/data
[4e/99de94] Submitted process > buildIndicesrRNA (yeast_rRNA)
[ba/add629] Submitted process > cutAdapters (sub)
[9b/5cfd9c] Submitted process > buildIndicesORF (yeast_CDS_w_250)
[e2/efbf15] Submitted process > createVizParamsConfigFile
[26/276bc0] Submitted process > createInteractiveVizParamsConfigFile
[f6/0d2405] Submitted process > extractUmis (sub)
[87/294e1e] Submitted process > hisat2rRNA (sub)
[ed/69c0b5] Submitted process > hisat2ORF (sub)
[cf/95701b] Submitted process > trim5pMismatches (sub)
[a6/9f4835] Submitted process > samViewSort (sub)
[45/36b2f8] Submitted process > outputBams (sub)
[f2/2912c1] Submitted process > makeBedgraphs (sub)
[23/6174c2] Submitted process > bamToH5 (sub)
[56/991285] Submitted process > generateStatsFigs (sub)
...
This ran and looked like it was working, but then bamToH5 (sub)
was very slow (30mins?) and generateStatsFigs (sub)
was even slower (45mins+). We suspect it got stuck. Abandoning for now. Will try again on a different node, probably. I checked that the output files from earlier parts looked about right with:
$ ls -l ${RIBOVIZ_SAMPLES}/output/sub
total 53888
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 135951 May 11 16:11 minus.bedgraph
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 1212439 May 11 16:11 plus.bedgraph
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 1660424 May 11 16:11 sub.bam
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 320040 May 11 16:11 sub.bam.bai
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 278439 May 11 16:24 sub.h5
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 51008296 May 11 16:24 sub.h5.1
Tried again from screen
(remote session). Again logged in to node1h20
$ source set-riboviz-env.sh
$ cd /home/$USER/riboviz/riboviz
$ export RIBOVIZ_SAMPLES=/exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
$ export RIBOVIZ_ORGANISMS=/home/${USER}/riboviz/example-datasets/fungi/saccharomyces
$ export RIBOVIZ_DATA=/home/${USER}/riboviz/riboviz/data
$ nextflow run prep_riboviz.nf \
-params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml \
-work-dir /exports/eddie/scratch/${USER}/work \
-ansi-log false
N E X T F L O W ~ version 20.04.1
Launching `prep_riboviz.nf` [happy_albattani] - revision: acd7535f8d
samples_dir: /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
organisms_dir: /home/ewallac2/riboviz/example-datasets/fungi/saccharomyces
data_dir: /home/ewallac2/riboviz/riboviz/data
[7d/aa260c] Submitted process > buildIndicesORF (yeast_CDS_w_250)
[fb/b90f10] Submitted process > cutAdapters (sub)
[07/3d600c] Submitted process > buildIndicesrRNA (yeast_rRNA)
[2e/3646ae] Submitted process > createVizParamsConfigFile
[df/96b320] Submitted process > createInteractiveVizParamsConfigFile
[8b/2a3ec3] Submitted process > extractUmis (sub)
[89/562dd9] Submitted process > hisat2rRNA (sub)
[cf/8ef719] Submitted process > hisat2ORF (sub)
[62/ac3981] Submitted process > trim5pMismatches (sub)
[04/8376f6] Submitted process > samViewSort (sub)
[eb/d73e08] Submitted process > outputBams (sub)
[ed/6c635e] Submitted process > makeBedgraphs (sub)
[98/15bc0c] Submitted process > bamToH5 (sub)
... # did not copy generateStatsFigs line properly here.
Finished processing sample: sub
[bc/a5f6ed] Submitted process > staticHTML (sub)
[cd/7fd80e] Submitted process > renameTpms (sub)
[e1/dcf2da] Submitted process > collateTpms (sub)
Finished visualising sample: sub
[c0/50af69] Submitted process > countReads
Workflow finished! (OK)
Again, sprinted through the first steps and slowed down on bamToH5
. Left running ...
... about 4h later, checked in, finished, looks good.
$ ls -l ${RIBOVIZ_SAMPLES}/output/
total 260
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 342 May 11 17:38 interactive_viz_config.yaml
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 1601 May 11 18:58 read_counts_per_file.tsv
drwx--S--- 2 ewallac2 datastore_biology_groups_wallace_rna 2048 May 11 18:58 sub
-rw------- 1 ewallac2 datastore_biology_groups_wallace_rna 70225 May 11 18:57 TPMs_all_CDS_all_samples.tsv
$ more ${RIBOVIZ_SAMPLES}/output/read_counts_per_file.tsv
# Created by: riboviz
# Date: 2022-05-11 18:58:12.267001
# Command-line tool: /exports/eddie3_homes_local/ewallac2/riboviz/riboviz/riboviz/tools/count_reads
.py
# File: /exports/eddie3_homes_local/ewallac2/riboviz/riboviz/riboviz/count_reads.py
# Version: commit cc97e742686617dea1d34d2387fa0e4d63a5f9d5 date 2022-05-09 23:38:36+02:00
SampleName Program File NumReads Description
sub input /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/input/Bfr_d
ata_wtRep1_init100000.fastq.gz 100000 input
sub cutadapt /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/tmp
/sub/trim.fq 99993 Reads after removal of sequencing library adapters
sub hisat2 /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/tmp/sub/non
rRNA.fq 56844 Reads that did not align to rRNA or other contaminating reads in rRNA index files
sub hisat2 /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/tmp/sub/rRN
A_map.sam 43126 Reads aligned to rRNA and other contaminating reads in rRNA index files
sub hisat2 /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/tmp/sub/una
ligned.fq 21205 Unaligned reads removed by alignment of remaining reads to ORFs index files
sub hisat2 /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/tmp/sub/orf
_map.sam 41930 Reads aligned to ORFs index files
sub riboviz.tools.trim_5p_mismatch /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-
Sc-Bfr1_2019/tmp/sub/orf_map_clean.sam 41930 Reads after trimming of 5' mismatches and removal of those with more than 2 mismatches
I moved the subsampled output to a new directory, output_sub_init100000
- and deleted tmp
mv ${RIBOVIZ_SAMPLES}/output ${RIBOVIZ_SAMPLES}/output_sub_init100000
rm -rf ${RIBOVIZ_SAMPLES}/tmp
Updated config file in commit 8998b67, to process all samples with num_processes = 4
.
Trying again on Eddie interactive node.
$ screen
$ qlogin -pe interactivemem 4 -l h_vmem=6G
...
Your interactive job 19545239 has been successfully scheduled.
(base) [ewallac2@node1h21(eddie) ~]$ source set-riboviz-env.sh
(riboviz) (base) [ewallac2@node1h21(eddie) ~]$ export PS1="$ " # just shorten command prompt
$ cd /home/$USER/riboviz/riboviz
$ export RIBOVIZ_SAMPLES=/exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
$ export RIBOVIZ_ORGANISMS=/home/${USER}/riboviz/example-datasets/fungi/saccharomyces
$ export RIBOVIZ_DATA=/home/${USER}/riboviz/riboviz/data
$ nextflow run prep_riboviz.nf \
-params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml \
-work-dir /exports/eddie/scratch/${USER}/work \
-ansi-log false --validate_only
N E X T F L O W ~ version 20.04.1
Launching `prep_riboviz.nf` [golden_meucci] - revision: acd7535f8d
Validating configuration only
samples_dir: /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
organisms_dir: /home/ewallac2/riboviz/example-datasets/fungi/saccharomyces
data_dir: /home/ewallac2/riboviz/riboviz/data
Validated configuration
$ nextflow run prep_riboviz.nf \
-params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml \
-work-dir /exports/eddie/scratch/${USER}/work \
-ansi-log false
N E X T F L O W ~ version 20.04.1
Launching `prep_riboviz.nf` [big_sax] - revision: acd7535f8d
samples_dir: /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
organisms_dir: /home/ewallac2/riboviz/example-datasets/fungi/saccharomyces
data_dir: /home/ewallac2/riboviz/riboviz/data
[8b/f05bb4] Submitted process > cutAdapters (DeltaBfr_Rep1)
[0d/e6fe6b] Submitted process > cutAdapters (WT_Rep2)
[95/e9202a] Submitted process > createVizParamsConfigFile
[1a/a3ddee] Submitted process > cutAdapters (DeltaBfr_Rep2)
[0c/375d74] Submitted process > buildIndicesrRNA (yeast_rRNA)
[a9/650ef4] Submitted process > buildIndicesORF (yeast_CDS_w_250)
[18/b318e8] Submitted process > cutAdapters (WT_Rep1)
[4d/bfcd68] Submitted process > createInteractiveVizParamsConfigFile
[a0/2c58b8] Submitted process > extractUmis (WT_Rep1)
[ef/cec40a] Submitted process > extractUmis (DeltaBfr_Rep1)
[a5/59383e] Submitted process > extractUmis (DeltaBfr_Rep2)
[7f/9f74ed] Submitted process > hisat2rRNA (WT_Rep1)
[29/b7bb27] Submitted process > extractUmis (WT_Rep2)
[2f/d70409] Submitted process > hisat2ORF (WT_Rep1)
[d9/bfc870] Submitted process > trim5pMismatches (WT_Rep1)
[dc/f06db8] Submitted process > samViewSort (WT_Rep1)
[e7/81a626] Submitted process > outputBams (WT_Rep1)
[8e/75ce5a] Submitted process > makeBedgraphs (WT_Rep1)
[1f/934ca5] Submitted process > bamToH5 (WT_Rep1)
[aa/9e6477] Submitted process > hisat2rRNA (DeltaBfr_Rep1)
[51/92d0ed] Submitted process > hisat2rRNA (DeltaBfr_Rep2)
[a8/f3aa7a] Submitted process > hisat2ORF (DeltaBfr_Rep1)
[76/369ef4] Submitted process > hisat2ORF (DeltaBfr_Rep2)
[98/5fac9d] Submitted process > trim5pMismatches (DeltaBfr_Rep1)
[61/4c81df] Submitted process > trim5pMismatches (DeltaBfr_Rep2)
[7b/ab79f7] Submitted process > samViewSort (DeltaBfr_Rep2)
[97/ba7c78] Submitted process > samViewSort (DeltaBfr_Rep1)
[0d/268c39] Submitted process > outputBams (DeltaBfr_Rep2)
[99/7b5df7] Submitted process > bamToH5 (DeltaBfr_Rep2)
[aa/87ad5e] Submitted process > makeBedgraphs (DeltaBfr_Rep2)
[1d/e17d58] Submitted process > generateStatsFigs (WT_Rep1)
[c3/701527] Submitted process > outputBams (DeltaBfr_Rep1)
[eb/daf151] Submitted process > makeBedgraphs (DeltaBfr_Rep1)
[8c/7c7cbe] Submitted process > bamToH5 (DeltaBfr_Rep1)
[4f/d2692a] Submitted process > generateStatsFigs (DeltaBfr_Rep2)
[6a/7cde88] Submitted process > hisat2rRNA (WT_Rep2)
[ff/999eeb] Submitted process > generateStatsFigs (DeltaBfr_Rep1)
[55/7847ad] Submitted process > hisat2ORF (WT_Rep2)
[4e/0526f3] Submitted process > trim5pMismatches (WT_Rep2)
[dd/f6946d] Submitted process > samViewSort (WT_Rep2)
[d5/a79501] Submitted process > outputBams (WT_Rep2)
[1f/a0d76d] Submitted process > bamToH5 (WT_Rep2)
[c1/d2f700] Submitted process > makeBedgraphs (WT_Rep2)
[40/fbf219] Submitted process > generateStatsFigs (WT_Rep2)
Finished processing sample: DeltaBfr_Rep2
[3a/b10216] Submitted process > renameTpms (DeltaBfr_Rep2)
[3f/56bbf1] Submitted process > staticHTML (DeltaBfr_Rep2)
Finished visualising sample: DeltaBfr_Rep2
Finished processing sample: DeltaBfr_Rep1
[a1/b5cbf5] Submitted process > renameTpms (DeltaBfr_Rep1)
[4a/6a697c] Submitted process > staticHTML (DeltaBfr_Rep1)
Finished visualising sample: DeltaBfr_Rep1
Finished processing sample: WT_Rep1
[97/3ede1e] Submitted process > renameTpms (WT_Rep1)
[d7/2b3a6f] Submitted process > staticHTML (WT_Rep1)
Finished visualising sample: WT_Rep1
Finished processing sample: WT_Rep2
[19/d0c3ea] Submitted process > renameTpms (WT_Rep2)
[77/d71ba3] Submitted process > staticHTML (WT_Rep2)
[97/3ab52b] Submitted process > collateTpms (DeltaBfr_Rep2, DeltaBfr_Rep1, WT_Rep1, WT_Rep2)
Finished visualising sample: WT_Rep2
[42/e8957b] Submitted process > countReads
Workflow finished! (OK)
$ nextflow log
...
2022-05-11 22:40:13 7h 45s big_sax OK acd7535f8d 357d19ec-f24e-483b-97f2-367ad679fb78 nextflow run prep_riboviz.nf -params-file CastellsBallester_2019_Bfr1_4samples_Scerevisiae.yaml -work-dir /exports/eddie/scratch/ewallac2/work -ansi-log false
Ran overnight in the end, 7h45min. Quite long!
Next: check the output.
We discussed how to view the output of a nextflow run. Unfortunately realised that we forgot to set options for html report -with-report
. Still we checked the logs
$ nextflow log big_sax -f 'process,exit,hash,duration'
cutAdapters 0 8b/f05bb4 6m 51s
cutAdapters 0 0d/e6fe6b 11m 8s
createVizParamsConfigFile 0 95/e9202a 140ms
cutAdapters 0 1a/a3ddee 7m 15s
buildIndicesrRNA 0 0c/375d74 6.9s
buildIndicesORF 0 a9/650ef4 6.6s
cutAdapters 0 18/b318e8 3m 45s
createInteractiveVizParamsConfigFile 0 4d/bfcd68 216ms
extractUmis 0 a0/2c58b8 7m 7s
extractUmis 0 ef/cec40a 13m 9s
extractUmis 0 a5/59383e 14m 20s
hisat2rRNA 0 7f/9f74ed 3m 52s
extractUmis 0 29/b7bb27 36m 6s
hisat2ORF 0 2f/d70409 2m 16s
trim5pMismatches 0 d9/bfc870 1m 15s
samViewSort 0 dc/f06db8 1m 5s
outputBams 0 e7/81a626 2.2s
makeBedgraphs 0 8e/75ce5a 28.5s
bamToH5 0 1f/934ca5 17m 8s
hisat2rRNA 0 aa/9e6477 10m 21s
hisat2rRNA 0 51/92d0ed 13m 2s
hisat2ORF 0 a8/f3aa7a 4m 43s
hisat2ORF 0 76/369ef4 1m 26s
trim5pMismatches 0 98/5fac9d 1m 26s
trim5pMismatches 0 61/4c81df 32.2s
samViewSort 0 7b/ab79f7 18s
samViewSort 0 97/ba7c78 1m 18s
outputBams 0 0d/268c39 385ms
bamToH5 0 99/7b5df7 9m 36s
makeBedgraphs 0 aa/87ad5e 8.8s
generateStatsFigs 0 1d/e17d58 5h 38m 25s
outputBams 0 c3/701527 1.8s
makeBedgraphs 0 eb/daf151 54.2s
bamToH5 0 8c/7c7cbe 10m 59s
generateStatsFigs 0 4f/d2692a 4h 30m 8s
hisat2rRNA 0 6a/7cde88 8m 26s
generateStatsFigs 0 ff/999eeb 5h 21m 56s
hisat2ORF 0 55/7847ad 4m 28s
trim5pMismatches 0 4e/0526f3 1m 57s
samViewSort 0 dd/f6946d 1m 37s
outputBams 0 d5/a79501 1.9s
bamToH5 0 1f/a0d76d 14m 22s
makeBedgraphs 0 c1/d2f700 1m 6s
generateStatsFigs 0 40/fbf219 5h 28m 15s
renameTpms 0 3a/b10216 161ms
staticHTML 0 3f/56bbf1 42.9s
renameTpms 0 a1/b5cbf5 161ms
staticHTML 0 4a/6a697c 23.4s
renameTpms 0 97/3ede1e 50ms
staticHTML 0 d7/2b3a6f 17.1s
renameTpms 0 19/d0c3ea 57ms
staticHTML 0 77/d71ba3 17.5s
collateTpms 0 97/3ab52b 1.3s
countReads 0 42/e8957b 12m 58s
Exception in thread "main" java.lang.OutOfMemoryError: Metaspace
Login To Eddie staging node with qlogin -q staging
.
$ export RIBOVIZ_SAMPLES=/exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019
$ OUTPUT_DATASTORE=/exports/csce/datastore/biology/groups/wallace_rna/data/2022/05-May/Edward/Bfr1_riboviz_2022-05-12
$ ls -l ${RIBOVIZ_SAMPLES}
total 4
drwx--S--- 2 ewallac2 datastore_biology_groups_wallace_rna 2048 May 11 22:40 index
lrwxrwxrwx 1 ewallac2 datastore_biology_groups_wallace_rna 78 May 11 15:52 input -> /exports/csce/eddie/biology/groups/wallace_rna/fastq-datafiles/CB-Sc-Bfr1_2019
drwx--S--- 6 ewallac2 datastore_biology_groups_wallace_rna 512 May 12 05:40 output
drwx--S--- 3 ewallac2 datastore_biology_groups_wallace_rna 512 May 11 18:58 output_sub_init100000
drwx--S--- 6 ewallac2 datastore_biology_groups_wallace_rna 512 May 11 22:51 tmp
$ du -sh ${RIBOVIZ_SAMPLES}/
1.3G /exports/csce/eddie/biology/groups/wallace_rna/20221105_CB-Sc-Bfr1_2019/ls -l ${
$ cp -r ${RIBOVIZ_SAMPLES}/output_sub_init100000 ${OUTPUT_DATASTORE}
$ cp -r ${RIBOVIZ_SAMPLES}/output ${OUTPUT_DATASTORE}
This means we are done with Eddie for a bit and will inspect the output files locally.
We checked the outputs.
Need to push branch to example-datasets.
Add new dataset to look at the effect of the RNA-binding protein Bfr1 on translation in S. cerevisiae. Note that the data were shared privately by Gunter Kramer, and are not on SRA as far as I know.
Castells-Bfr1-107
wallace_rna/bigdata/2021/Bfr1_ribosomeprofiling_Kramer
. We will need to copy over to Eddie, collate the files and remove the spaces from filenames.