mskilab-org / JaBbA

MIP based joint inference of copy number and rearrangement state in cancer whole genome sequence data.
MIT License
56 stars 25 forks source link

double chromosomes notation in JaBbA's output #87

Closed l0ka closed 1 year ago

l0ka commented 1 year ago

Hi, I'm running JaBbA using as input the coverage file from fragCounter, the junctions bedpe from Delly, the segments rds file from CNVkit and purity and ploidy from Sequenza. Both the bedpe and segfile have the chromosomes in UCSC format, with the "chr" prefix. As you know, fragCounter removes the "chr" but when I run JaBbA with the coverage file with or without the "chr", I always get the output with the chromosome annotated with both "chr" and without. These are my input files:

## Coverage file, fragCounter default:
cov <- rtracklayer::import("cov.corrected.bw")
seqlevels(cov)
"1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16"
"17" "18" "19" "20" "21" "22" "X"  "Y"

## Segments
seg <- readRDS("sample.segs.rds")
seqlevels(seg)
"chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chr10"
"chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr20"
"chr21" "chr22" "chrX"  "chrY"

## Junctions
jun <- fread("sample.bedpe", data.table = F)
unique(jun$V1)
"chr1"  "chr2"  "chr3"  "chr4"  "chr6"  "chr7"  "chr8"  "chr9"  "chr10" "chr11" 
"chr12" "chr13" "chr15" "chr16" "chr17"

When I import JaBbA's output in gGnome I get:

gg.jabba = gG(jabba = "sample/jabba.simple.gg.rds")
seqlevels(gg.jabba)
"1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" "3" "4"
"5" "6" "7" "8" "9" "X" "Y" "chr1" "chr10" "chr11" "chr12" "chr13" "chr14"
"chr15" "chr16" "chr17" "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3"
"chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chrX" "chrY"

And this is the plot for chromosome 7, with the warning about chr notation:

plot(gg.jabba$gt, y = '7')

Warning messages:
1: In gUtils::gr.sub(tmp.dat, "chr", "") :
  Warning: gr.sub had to convert GRanges to data.table before replacing seqlevels: check input seqlevels e.g. for mixed chr and non-chr seqlevels
2: In matrix(paste("c", rep(1:ncol(B), each = 2), c(".x", ".y"), sep = ""),  :
  data length differs from size of matrix: [12 != 2 x 4]

default

When I add the "chr" to the coverage file:

cov  <- rtracklayer::import("cov.corrected.bw")
seqlevelsStyle(cov) <- "UCSC"
seqlevels(cov)
"chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11"
"chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr20" "chr21"
"chr22" "chrX"  "chrY"

run JaBbA and import the output in gGnome, I still get the same result:

gg.jabba = gG(jabba = "sample_chr/jabba.simple.gg.rds")
seqlevels(gg.jabba)
"1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" "3" "4"
"5" "6" "7" "8" "9" "X" "Y" "chr1" "chr10" "chr11" "chr12" "chr13" "chr14"
"chr15" "chr16" "chr17" "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3"
"chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chrX" "chrY"

The plot now looks like that (and obviously I get exactly the same warning as before):

chr

Am I doing something wrong? Thank you in advance!

l0ka commented 1 year ago

EDIT: I managed to solve the warning simply by removing the "chr" from both the junctions and segments files.

Moreover, I also was experiencing issue #85 for some samples, so I tried using R 4.1, as suggested. It works without giving the error but I noticed that even for the samples for which I wasn't getting the error the results are different compared to when using R 4.2. For example, in the SVs classification (on the same sample):

jabba.ev <- events(gg.jabba, verbose = TRUE)
jabba.ev$meta$event[, table(type)]

## R 4.2
# Chromothripsis:             1
# Double minute:              1
# Templated insertion chains: 1

## R 4.1
# chromothripsis: 1                                 
# del:            9
# dup:            3
# tyfonas:        1