Closed renberg closed 2 years ago
Hi
Well there seems to be multiple issues. First of all it tells you that you run pipeline version null
... This should be at something like 2.3.2. it is defined in the global config file So which version of mpraflow you are running?
Then it seems there is an issue with your experiment file. Nextflow is not able to create runs and replicates from it. Can you post this file?
Using your own association file is absolutely fine and not the error.
I noticed the null version thing too and don't know what to make of it. Where can I find the global config file? Sorry, not much of a computer sciences background, really appreciate the help.
Here's my experiment file, we only have a single replicate for this small scale study. 36crs_experiment.csv
Thank you for providing the experiment file. It tels me that your headre is false. Have a look in the documentation. It should be exactly: Condition,Replicate,DNA_BC_F,DNA_UMI,DNA_BC_R,RNA_BC_F,RNA_UMI,RNA_BC_R
. The ordering is not important (see code here) but the names! To I think your order is R1,R2,I1 So you can use the header Condition,Replicate,DNA_BC_F,DNA_BC_R,DNA_UMI,RNA_BC_F,RNA_BC_R,RNA_UMI
Again the question. which version you use? Wat did you downloaded or checked out? The conf/gobal.conf
file should show the version.
Thanks for pointing out my experiment file mistake. I was using the header from your Nature communications paper example.
Found global config file, it says MPRAflow version 2.3.5, and nextflow version required 20.10
Experiment file seems to have fixed that problem, it runs now!
However, it stopped toward the end and gave me the following:
WARN: Killing running tasks (1)
executor > slurm (12)
[1a/ec4274] process > create_BAM (make idx) [100%] 2 of 2 ✔
[b4/27ea54] process > raw_counts (1) [100%] 2 of 2 ✔
[4c/aa988c] process > filter_counts (2) [100%] 2 of 2 ✔
[9c/e6dc52] process > final_counts (2) [100%] 2 of 2 ✔
[da/112811] process > dna_rna_merge_counts (1) [100%] 1 of 1 ✔
[38/b0feb5] process > dna_rna_merge (1) [100%] 1 of 1 ✔
[9e/c4adc6] process > calc_correlations (1) [ 0%] 0 of 1
[ff/c340fa] process > make_master_tables (1) [ 0%] 0 of 1
Error executing process > 'calc_correlations (1)'
Caused by:
Missing output file(s) `*_correlation.txt` expected by process `calc_correlations (1)`
Command executed:
Rscript /home/renberg/MPRAflow/src/plot_perInsertCounts_correlation.R 36CRS NA 10 36CRS_1_counts.tsv 1
Command exit status:
0
Command output:
File Replicate Condition
1 36CRS_1_counts.tsv 1 36CRS
[1] "hist"
name dna_count rna_count ratio log2 n_obs_bc
1 MiniPromoter 7961.529 4659.243 0.5852197 -0.77294985 1
2 SCP_1 17874.379 17406.864 0.9738444 -0.03823685 1
3 CMV_Promoter 13447.450 5440.661 0.4045868 -1.30547875 1
4 TNNT2_Promoter_Full 23967.918 108108.753 4.5105609 2.17330684 1
5 TNNT2_Promoter_Minimal 11586.403 14480.612 1.2497935 0.32168979 1
6 TNNT2_Promoter_Micro1 21363.842 17141.334 0.8023526 -0.31769173 1
[1] 1 1 1 1 1 1
[1] "boxplot"
name log2
1
2 MiniPromoter -0.772949852586751
3 SCP_1 -0.0382368450302552
4 CMV_Promoter -1.30547874517947
5 TNNT2_Promoter_Full 2.17330684394993
6 TNNT2_Promoter_Minimal 0.321689790304395
[1] "merged"
name log2 label
1 NA
2 MiniPromoter -0.772949852586751 NA
3 SCP_1 -0.0382368450302552 NA
4 CMV_Promoter -1.30547874517947 NA
5 TNNT2_Promoter_Full 2.17330684394993 NA
6 TNNT2_Promoter_Minimal 0.321689790304395 NA
name log2 label
25 TNNT2_Promoter_Minimal_MYBPC3intronEnh -2.1637874 NA
14 ACTC1_Promoter_Full -1.8552410 NA
15 ACTC1_Promoter_Minimal -1.5105718 NA
4 CMV_Promoter -1.3054787 NA
36 TNNT2_Promoter_Micro3withPromTFBSinIntronEnh -0.8386609 NA
2 MiniPromoter -0.7729499 NA
'data.frame': 36 obs. of 3 variables:
$ name : Factor w/ 37 levels "","ACTC1_Promoter_Full",..: 34 2 3 7 19 8 18 17 5 21 ...
$ log2 : num -2.164 -1.855 -1.511 -1.305 -0.839 ...
$ label: Factor w/ 1 level "NA": 1 1 1 1 1 1 1 1 1 1 ...
NULL
png
2
png
2
Command error:
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Am I correct that it is upset that I only have one replicate, so it gets confused when it tries to calculate correlation across replicates? It seems to have produced all of the files in the output folder for the replicate, but not the various plots that would show all replicates if there were more than one.
More importantly, Are there any critical steps toward the end of the pipeline that could affect my data, or can I just use what it gave me in the replicate folder?
Thanks again!!!
yes. I think so, too. In teh past we did not have data without replicates.
The missing two steps are not important. You should have everything you need. The correlations plots are missing (but cannot be computed becasue of missing repliactes) and the master table. Which is a table that combine counts, BCs and expression fold changes from all replicates.
I saw someone else producing the same error, but mine appears to be different so I'm hoping someone can help. I've been struggling to get this pipeline to work for a few weeks now. Here is my output from the latest failed run:
Here is the command I am running (as part of a slurm script): nextflow run count.nf --experiment-file "36crs/36crs_experiment.csv" --dir "36crs/6146-AR/fastqs_6146-AR" --outdir "36crs/outputs" --association "36crs/fake_assoc_dict.p" --design "36crs/data/CRS.fa"
For additional possibly pertinent background, I made the association and design files from scratch instead of using the association function because we used known unique barcodes for our 36 inserts and did not need to do an initial sequencing run for barcode association.
Thanks for your help.