Closed laugon10 closed 3 years ago
Hi Laura, I recently fixed this bug, which is related to a Delly version update resulting in changed IO formats.
Could you please try the code in this development repository and check if it works? https://github.com/roland-rad-lab/MoCaSeq/tree/human-pipeline/repository
The most important changes are Chromothripsis_Delly_Annotate-Filter.R as well as lines 1306-1396 in the main script.
Good luck
Hi, I have tried with this version. And now I obtained an empty filtered file. Are you sure it is compatible with the old version of mocaseq? Should I re-run something?
Thank you very much, Laura
It should be compatible with the old version, but I did not fully test it.
The new code should look like this:
coverage=$(sh $repository_dir/Chromothripsis_GetCoverage.sh $name)
java -jar $snpeff_dir/SnpSift.jar extractFields \
$name/results/Delly/$name.pre.vcf \
CHROM POS CHR2 POS2 END PE SR PRECISE IMPRECISE \
GEN[Tumor].DR GEN[Tumor].CN GEN[Tumor].RR GEN[Tumor].RV \
GEN[Normal].DR GEN[Normal].CN GEN[Normal].RR GEN[Normal].RV \
MAPQ CT \
> /var/pipeline/${name}/results/Delly/${name}.breakpoints.tab
Rscript $repository_dir/Chromothripsis_Delly_Annotate-Filter.R \
-n $name \
-i $name/results/Delly/$name.breakpoints.tab \
-o T \
-c $coverage \
-v 0.2 \
-d 6000
Which version of Delly and MoCaSeq are you running?
Is the unfiltered file (Delly/NAME.breakpoints.tab) also empty? If not, could you please show me the first few lines of that file?
Hi, Yes, I tried exactly those lines. I just corrected the coverage to 33 since I am not doing the full mocaseq pipe.
Delly2 v0.8.3. For Mocaseq I installed it with the latest -> I have not been able to find the version in any file. But since I have install it in October 2020 I guess it is v0.4.53 (master).
This is the beginning of the Delly/name.breakpoints.tab file:
CHROM POS CHR2 POS2 END PE SR PRECISE IMPRECISE GEN[Tumor].DR GEN[Tumor].CN GEN[Tumor].RR GEN[Tumor].RV GEN[Normal].DR GEN[Normal].CN GEN[Normal].RR GEN[Normal].RV MAPQ CT
1 3003408 81643743 4 false true 22 4 0 0 28 4 0 0 1 5to3
1 3003441 3799048 2 false true 28 4 0 0 21 4 0 0 1 5to3
1 3153301 153080616 2 false true 31 6 0 0 31 7 0 0 1 3to3
1 3232541 7985537 6 false true 32 4 0 0 40 4 0 0 1 5to3
1 3233304 100472635 6 false true 39 4 0 0 44 4 0 0 1 3to3
1 3305491 103894906 2 false true 28 4 0 0 31 4 0 0 1 3to3
1 3344038 182182277 5 false true 39 24 0 0 42 25 0 0 1 5to3
Thank you very much, Laura
So the versions as well as the data format is fine.
I assume one of the filter cutoffs is removing all lines.
These are the main 3 filter steps (coverage and MAPQ is only applied later):
Could you please manually run the Chromothripsis_Delly_Annotate-Filter.R script?
You can set this
setwd("PATH_BEFORE_NAME_FOLDERS")
name <- "NAME"
input <- paste0(name, "/results/Delly/",name,".breakpoints.tab")
FilterInterChromosomalRearrangements <- T
CoverageFilter <- 33
TumorVarFreq <- 0.2
maxDist <- 6000
... and then start at line 40 and stop at line 85, showing which filter(s) is removing everything:
#inputDT[control.varFreq == 0 & tumor.varFreq >= TumorVarFreq & distances >= maxDist] # original line
inputDT[control.varFreq == 0]
inputDT[tumor.varFreq >= TumorVarFreq]
inputDT[distances >= maxDist]
Another way would be to show me all the ranges, which tells us exactly which column is failing the filter:
inputDT[, sapply(.SD, range)]
scriptFilterDelly.txt Hi, I have run the code I am attaching (scriptFilterDelly.R).
I have print a table before the final filtering step that look like this:
head inputDTprefilter
"donChr" "donPos" "accChr" "accPos" "PE" "SR" "PRECISE" "IMPRECISE" "tumor.DR" "tumor.DV" "tumor.RR" "tumor.RV" "control.DR" "control.DV" "control.RR" "control.RV" "MAPQ" "Type" "control.pairs.ratio" "control.SR.ratio" "tumor.pairs.ratio" "tumor.SR.ratio" "control.coverage" "tumor.coverage" "control.varFreq" "tumor.varFreq" "distances"
"1" 3003408 NA NA 4 NA FALSE TRUE 22 4 0 0 28 4 0 0 1 "5to3" 0.125 0 0.153846153846154 0 32 26 0.125 0.153846153846154 NA
"1" 3003441 NA NA 2 NA FALSE TRUE 28 4 0 0 21 4 0 0 1 "5to3" 0.16 0 0.125 0 25 32 0.16 0.125 NA
"1" 3153301 NA NA 2 NA FALSE TRUE 31 6 0 0 31 7 0 0 1 "3to3" 0.184210526315789 0 0.162162162162162 0 38 37 0.184210526315789 0.162162162162162 NA
"1" 3232541 NA NA 6 NA FALSE TRUE 32 4 0 0 40 4 0 0 1 "5to3" 0.0909090909090909 0 0.111111111111111 0 44 36 0.0909090909090909 0.111111111111111 NA
"1" 3233304 NA NA 6 NA FALSE TRUE 39 4 0 0 44 4 0 0 1 "3to3" 0.0833333333333333 0 0.0930232558139535 0 48 43 0.0833333333333333 0.0930232558139535 NA
"1" 3305491 NA NA 2 NA FALSE TRUE 28 4 0 0 31 4 0 0 1 "3to3" 0.114285714285714 0 0.125 0 35 32 0.114285714285714 0.125 NA
"1" 3344038 NA NA 5 NA FALSE TRUE 39 24 0 0 42 25 0 0 1 "5to3" 0.373134328358209 0 0.380952380952381 0 67 63 0.373134328358209 0.380952380952381 NA
"1" 3452589 NA NA 2 NA FALSE TRUE 42 4 0 0 45 4 0 0 4 "3to5" 0.0816326530612245 0 0.0869565217391304 0 49 46 0.0816326530612245 0.0869565217391304 NA
"1" 3612939 NA NA 2 NA FALSE TRUE 41 26 0 0 48 27 0 0 1 "3to3" 0.36 0 0.388059701492537 0 75 67 0.36 0.388059701492537 NA
But with the final filtering step I have obtained an empty table.
# FILTERING
if(FilterInterChromosomalRearrangements){
print(paste("Filtering only intrachromosomal rearrangements"))
inputDT <- inputDT[donChr==accChr]
}
This is how my CNVs plot look like, so I am expecting something in this table.
Thank you very much! Laura
Ok so the issue is the "NA" in the accChr and accPos column. This is probably due to the lines 55-57:
inputDT[accChr == "", accPos := END]
inputDT[accChr == "", accChr := donChr]
inputDT[, END := NULL]
This will set the values for the acceptor information based on donChr and END column.
I tried it for the 7 lines you sent me and it is working without issues.
Could you please check what is happening with the mentioned find and replace commands?
I.e. if the values in accChr are empty ("") like we expect them to be, maybe there are some hidden whitespaces.
inputDT[accChr == ""]
Hi,
I have tried and what I think is happening is that setting the values for acceptor is no working at all.
I execute for ie:
inputDT[accChr == "", accPos := END]
And none of the NA are change in to the right value. They keep the NA value. It seems that NAs arre not recognized since with
inputDT[accChr == ""]
Te result is:
Empty data.table (0 rows and 27 cols): donChr,donPos,accChr,accPos,PE,SR...
But with ie:
inputDT[accChr == "1"]
I get:
donChr donPos accChr accPos PE SR PRECISE IMPRECISE tumor.DR
1: 10 3251873 1 54408453 4 NA FALSE TRUE 41
2: 10 3990666 1 128638550 2 NA FALSE TRUE 58
3: 10 3990826 1 128638448 5 NA FALSE TRUE 53
4: 10 4499209 1 88254067 0 9 TRUE FALSE 0
5: 10 6088966 1 84965515 0 5 TRUE FALSE 0
---
1296: Y 70166070 1 145917349 2 NA FALSE TRUE 19
1297: Y 81822591 1 125892066 5 NA FALSE TRUE 21
1298: Y 81822736 1 125891794 3 NA FALSE TRUE 13
1299: Y 82439135 1 3145039 2 NA FALSE TRUE 19
1300: Y 88589743 1 179980522 2 NA FALSE TRUE 21
tumor.DV tumor.RR tumor.RV control.DR control.DV control.RR control.RV
1: 2 0 0 48 2 0 0
2: 2 0 0 38 2 0 0
3: 2 0 0 36 2 0 0
4: 2 53 37 0 2 60 50
5: 2 20 8 0 2 25 9
---
1296: 0 0 0 12 -1 0 0
1297: 0 0 0 19 -1 0 0
1298: 0 0 0 10 -1 0 0
1299: 0 0 0 25 -1 0 0
1300: 0 0 0 23 -1 0 0
MAPQ Type control.pairs.ratio control.SR.ratio tumor.pairs.ratio
1: 41 5to5 0.04000000 0.0000000 0.04651163
2: 59 5to5 0.05000000 0.0000000 0.03333333
3: 60 3to3 0.05263158 0.0000000 0.03636364
4: 0 3to3 1.00000000 0.4545455 1.00000000
5: 0 3to5 1.00000000 0.2647059 1.00000000
---
1296: 42 3to5 -0.09090909 0.0000000 0.00000000
1297: 60 5to5 -0.05555556 0.0000000 0.00000000
1298: 60 3to3 -0.11111111 0.0000000 0.00000000
1299: 47 3to5 -0.04166667 0.0000000 0.00000000
1300: 27 5to5 -0.04545455 0.0000000 0.00000000
tumor.SR.ratio control.coverage tumor.coverage control.varFreq
1: 0.0000000 50 43 0.04000000
2: 0.0000000 40 60 0.05000000
3: 0.0000000 38 55 0.05263158
4: 0.4111111 112 92 0.46428571
5: 0.2857143 36 30 0.30555556
---
1296: 0.0000000 11 19 -0.09090909
1297: 0.0000000 18 21 -0.05555556
1298: 0.0000000 9 13 -0.11111111
1299: 0.0000000 24 19 -0.04166667
1300: 0.0000000 22 21 -0.04545455
tumor.varFreq distances
1: 0.04651163 NA
2: 0.03333333 NA
3: 0.03636364 NA
4: 0.42391304 NA
5: 0.33333333 NA
---
1296: 0.00000000 NA
1297: 0.00000000 NA
1298: 0.00000000 NA
1299: 0.00000000 NA
1300: 0.00000000 NA
Sorry for all the trouble. I can not undertand the problem.
Laura
Yes, so these are inter-chromosomal breakpoints, which will always be there like this. Those are filtered out by default (intraChromOnly).
donChr donPos accChr accPos
10 3251873 1 54408453
As I said, we expect the intra-chromosomal to be an empty value (""), which should be the output of SnpSift.jar extractFields. We need to find out what the actual value of CHR2 and POS2 is in your Delly/name.breakpoints.tab.
CHROM POS CHR2 POS2 END
1 3003408 81643743
How about this?
inputDT[is.na(accChr)]
Alternatively you can attach me the subset of the original Delly/name.breakpoints.tab file (please extracted without parsing via console and with some of the lines with empty CHR2 values as shown above).
Thanks, I understand that only intrachromosomal are kept. (I wonder if it possible to obtain also filtered interchromosomal?).
This inputDT is the table I get before the final step to keep only intrachromosomal arrangements. Since the inputDT after this step its empty. Just to be sure we are talking about the same data file.
I am attaching a subset obtained like this.
head -1000 NAME.breakpoints.tab > subset.breakpoints.tab
subset.breakpoints.txt
The output of the comand:
> inputDT[is.na(accChr)]
donChr donPos accChr accPos PE SR PRECISE IMPRECISE tumor.DR tumor.DV
1: 1 3003408 NA NA 4 NA FALSE TRUE 22 4
2: 1 3003441 NA NA 2 NA FALSE TRUE 28 4
3: 1 3153301 NA NA 2 NA FALSE TRUE 31 6
4: 1 3232541 NA NA 6 NA FALSE TRUE 32 4
5: 1 3233304 NA NA 6 NA FALSE TRUE 39 4
---
34025: Y 91313818 NA NA 2 NA FALSE TRUE 0 0
34026: Y 91314427 NA NA 2 NA FALSE TRUE 0 0
34027: Y 91317516 NA NA 2 NA FALSE TRUE 0 0
34028: Y 91318976 NA NA 2 NA FALSE TRUE 0 0
34029: Y 91319175 NA NA 3 NA FALSE TRUE 0 0
tumor.RR tumor.RV control.DR control.DV control.RR control.RV MAPQ Type
1: 0 0 28 4 0 0 1 5to3
2: 0 0 21 4 0 0 1 5to3
3: 0 0 31 7 0 0 1 3to3
4: 0 0 40 4 0 0 1 5to3
5: 0 0 44 4 0 0 1 3to3
---
34025: 0 0 0 -1 0 0 13 3to5
34026: 0 0 0 -1 0 0 1 5to3
34027: 0 0 0 -1 0 0 49 5to3
34028: 0 0 0 -1 0 0 1 5to3
34029: 0 0 0 -1 0 0 1 3to5
control.pairs.ratio control.SR.ratio tumor.pairs.ratio tumor.SR.ratio
1: 0.12500000 0 0.15384615 0
2: 0.16000000 0 0.12500000 0
3: 0.18421053 0 0.16216216 0
4: 0.09090909 0 0.11111111 0
5: 0.08333333 0 0.09302326 0
---
34025: 1.00000000 0 0.00000000 0
34026: 1.00000000 0 0.00000000 0
34027: 1.00000000 0 0.00000000 0
34028: 1.00000000 0 0.00000000 0
34029: 1.00000000 0 0.00000000 0
control.coverage tumor.coverage control.varFreq tumor.varFreq distances
1: 32 26 0.12500000 0.15384615 NA
2: 25 32 0.16000000 0.12500000 NA
3: 38 37 0.18421053 0.16216216 NA
4: 44 36 0.09090909 0.11111111 NA
5: 48 43 0.08333333 0.09302326 NA
---
34025: -1 0 1.00000000 0.00000000 NA
34026: -1 0 1.00000000 0.00000000 NA
34027: -1 0 1.00000000 0.00000000 NA
34028: -1 0 1.00000000 0.00000000 NA
34029: -1 0 1.00000000 0.00000000 NA
Thank you very much!
As expected, the problem was the cell value being NA instead of "". I pushed a fix for this just now.
These are the small changes to catch both cases:
# set types to avoid warnings
inputDT[, accPos := as.integer(accPos)]
inputDT[, accChr := as.integer(accChr)]
# empty acceptor CHROM+POS is equal to changes on the same chromosome
inputDT[accChr == "" | is.na(accChr), accPos := END]
inputDT[accChr == "" | is.na(accChr), accChr := donChr]
inputDT[, END := NULL]
Now this should work as expected, but maybe you have to adjust the cutoffs for your data (control.varFreq, tumor.varFreq, distances).
Thank you! This worked!! I started with 43371 lines in the Name.breakpoints.tab and now I get 33704 in the filtered. Do you think is enough strict in your experience?
No problem :)
Sorry, I can not tell you anything about the filtering just by the number of lines. I would recommend evaluating everything by the criteria described in the publication (i.e. Figure 11).
Hi! I am having some issues with the Delly and Chromotripsis detection. I have done al the steps in the interactive mode and everything seems ok until I get to the $name.breakpoints.filtered.tab which only contains NAs. I have started already from bams, so I have not done all the steps in the pipeline. Maybe it is something related to it. I would really appreciate some advice.
Thank you very much, Laura