mskilab-org / JaBbA

MIP based joint inference of copy number and rearrangement state in cancer whole genome sequence data.
MIT License
56 stars 26 forks source link

jba is not passing the coverage and junctions arguments to the JaBbA correctly #18

Closed pwaltman closed 5 years ago

pwaltman commented 5 years ago

Have you forgotten to check in the correct/updated version of the jba executable?

As a test of the install, I installed JaBbA into a new module that I've called "JaBbA_simplified", but it doesn't work with the test data that you provide. Here's the output from that test:

waltman@waltman-workstation:~$ cd modules/JaBbA_simplified/
waltman@waltman-workstation:~/modules/JaBbA_simplified$ ls
hydrant.deploy  hydrant.deploy~  JaBbA  run.sh  run.sh~
waltman@waltman-workstation:~/modules/JaBbA_simplified$ ls JaBbA/
configure  configure.ac  DESCRIPTION  inst  JaBbA.pdf  jba  NAMESPACE  R  README.md  rtdocs  srcs  tests
waltman@waltman-workstation:~/modules/JaBbA_simplified$ JaBbA/jba JaBbA/inst/extdata/junctions.vcf JaBbA/inst/extdata/coverage.txt

 _____         ___    _      _____
(___  )       (  _`\ ( )    (  _  )
    | |   _ _ | (_) )| |_   | (_) |
 _  | | /'_` )|  _ <'| '_`\ |  _  |
( )_| |( (_| || (_) )| |_) )| | | |
`\___/'`\__,_)(____/'(_,__/'(_) (_)

(Junction     Balance     Analysis)

JaBbA 2019-01-20 11:08:41: Located junction file JaBbA/inst/extdata/junctions.vcf
JaBbA 2019-01-20 11:08:41: Located coverage file JaBbA/inst/extdata/coverage.txt
JaBbA 2019-01-20 11:08:41: Loading packages ...
JaBbA 2019-01-20 11:08:42: Starting analysis in /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/JaBbA_simplified
JaBbA 2019-01-20 11:08:42: Read in 0 total junctions
JaBbA 2019-01-20 11:08:42: Empty junction input. Will do just one round of optimization.
Error in dt2gr(coverage) :
  Error: Input needs to be data.table or data.frame
Calls: suppressPackageStartupMessages ... withCallingHandlers -> JaBbA -> jabba_stub -> dt2gr
In addition: There were 16 warnings (use warnings() to see them)
Execution halted
pwaltman commented 5 years ago

checking the code of the jba rscript that's in the repo, it creates an rds file called cmd.args.rds that stores a list with the 2 elements "junctions" and 'coverage', but it doesn't use those in the call to Jabba, i.e. here's the relelvant code snippet from the jba rscript, below. Shouldn't it be passing opt$coverage instead of opt$abu and opt$junctions instead of junctions$ra?

...
    jmessage('Located junction file ', opt$junctions)
    jmessage('Located coverage file ', opt$coverage)
    jmessage('Loading packages ...')
    writeLines(paste(paste("--", names(opt), " ", sapply(opt, function(x) paste(x, collapse = ",")), sep = "", collapse = " "), sep = ""), paste(opt$outdir, "cmd.args", sep = "/"))
    saveRDS(opt, paste(opt$outdir, "cmd.args.rds", sep = "/"))
    jab = JaBbA(coverage = opt$abu,
                blacklist = opt$blacklist,
                junctions = opt$ra,
                juncs.uf = opt$ra.supp,
                seg = opt$seg,
                cfield = opt$cfield,
                tfield = opt$tfield,
                nseg = opt$nseg,
                hets = opt$hets,
                outdir = opt$outdir,
                subsample = opt$subsample,
                name = opt$name,
                field = opt$field,
                slack = opt$slack,
                loose.penalty.mode = ifelse(opt$boolean, "boolean", "linear"),
                strict = as.logical(opt$strict),
                tilim = opt$tilim,
                ploidy = opt$ploidy,
                purity = opt$purity,
                pp.method = opt$ppmethod,
                verbose = 2,
                mc.cores = opt$cores,
                reiterate = opt$iterate,
                rescue.window = opt$window,
                edgenudge = opt$edgenudge,
                nudge.balanced = opt$nudgebalanced,
                indel = opt$indel,
                all.in = opt$allin,
                use.gurobi = opt$gurobi,
                epgap = opt$epgap,
                max.na = opt$maxna)
pwaltman commented 5 years ago

In an attempt to fix the issue, I modified the jba rscript so that it set opt$abu and opt$ra to opt$coverage and opt$junctions, respectively, but now, I'm getting a different error - that the junctions file doesn't contain any junctions (see below). However, I checked, and the test junctions file definitely contains breakpoints, so I'm at a bit of a loss at this point.

waltman@waltman-workstation:~/modules/JaBbA_simplified$ JaBbA/jba JaBbA/inst/extdata/junctions.vcf JaBbA/inst/extdata/coverage.txt

 _____         ___    _      _____
(___  )       (  _`\ ( )    (  _  )
    | |   _ _ | (_) )| |_   | (_) |
 _  | | /'_` )|  _ <'| '_`\ |  _  |
( )_| |( (_| || (_) )| |_) )| | | |
`\___/'`\__,_)(____/'(_,__/'(_) (_)

(Junction     Balance     Analysis)

JaBbA 2019-01-20 11:20:30: Located junction file JaBbA/inst/extdata/junctions.vcf
JaBbA 2019-01-20 11:20:30: Located coverage file JaBbA/inst/extdata/coverage.txt
JaBbA 2019-01-20 11:20:30: Loading packages ...
JaBbA 2019-01-20 11:20:31: Starting analysis in /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/JaBbA_simplified
JaBbA 2019-01-20 11:20:31: Read in 0 total junctions
JaBbA 2019-01-20 11:20:31: Empty junction input. Will do just one round of optimization.
JaBbA 2019-01-20 11:20:31: Importing seg from UCSC format
Error: scanVcf: 'indexname' must be character(1)
  path: JaBbA/inst/extdata/junctions.vcf
In addition: There were 16 warnings (use warnings() to see them)
Execution halted

my fix:

    opt = NULL;
    if (!is.null(args)) {
        opt = args$options
        opt$junctions = args$args[1]
        opt$coverage = args$args[2]
        if (!file.exists(opt$junctions))
        {   
            message('Did not find junction file ', opt$junctions)
            print_help(parseobj); stop()
        }
        if (!file.exists(opt$coverage))
        {   
            message('Did not find coverage file ', opt$coverage)
            print_help(parseobj); stop()
        }

        if ( is.null( opt$abu ) ) opt$abu <- opt$coverage
        if ( is.null( opt$ra ) ) opt$ra <- opt$junctions
    } else {
        stop("See usage.")
    }
pwaltman commented 5 years ago

UPDATE: after reading the code, I realized that I should set the DEFAULT_GENOME & DEFAULT_BSGENOME environment variables to a seq_lengths file, but am now getting the following error when I use the provided test data:

waltman@waltman-workstation:~/modules/JaBbA_simplified$ JaBbA/jba JaBbA/inst/extdata/junctions.vcf JaBbA/inst/extdata/coverage.txt

 _____         ___    _      _____
(___  )       (  _`\ ( )    (  _  )
    | |   _ _ | (_) )| |_   | (_) |
 _  | | /'_` )|  _ <'| '_`\ |  _  |
( )_| |( (_| || (_) )| |_) )| | | |
`\___/'`\__,_)(____/'(_,__/'(_) (_)

(Junction     Balance     Analysis)

JaBbA 2019-01-20 12:21:21: Located junction file JaBbA/inst/extdata/junctions.vcf
JaBbA 2019-01-20 12:21:21: Located coverage file JaBbA/inst/extdata/coverage.txt
JaBbA 2019-01-20 12:21:21: genome env var /ifs/scratch/c2b2/ac_lab/shares/resources/Homo_sapiens/GATK/b37/human_g1k_v37_decoy.seq_lengths
JaBbA 2019-01-20 12:21:21: bsgenome env var /ifs/scratch/c2b2/ac_lab/shares/resources/Homo_sapiens/GATK/b37/human_g1k_v37_decoy.seq_lengths
JaBbA 2019-01-20 12:21:21: Loading packages ...
JaBbA 2019-01-20 12:21:22: Starting analysis in /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/JaBbA_simplified
JaBbA 2019-01-20 12:21:24: ALT field format like BND
JaBbA 2019-01-20 12:21:24: Read in 83 total junctions
JaBbA 2019-01-20 12:21:24: Read in coverage data across 26,625 bins and 2 chromosomes
JaBbA 2019-01-20 12:21:24: No segmentation provided, so performing segmentation using CBS
JaBbA 2019-01-20 12:21:27: Segmentation finished
JaBbA 2019-01-20 12:21:28: 240 segments produced
JaBbA 2019-01-20 12:21:28: Found tier field enforcing >=1 CN at 0 junctions
JaBbA 2019-01-20 12:21:28: Removing 0 tier 3 junctions
JaBbA 2019-01-20 12:21:28: Did not find nseg file! Ignore.
JaBbA 2019-01-20 12:21:29: Definining coverage good quality nodes as 90% bases covered by non-NA and non-Inf values in +/-100KB region
JaBbA 2019-01-20 12:21:29: Hard setting 8.61001 Mb of the genome to NA that didn't pass our quality threshold
JaBbA 2019-01-20 12:21:29: Using loess to fit mean to variance relationship in segments with greater than 20 bins
JaBbA 2019-01-20 12:21:29: Using ppgrid to estimate purity ploidy
JaBbA 2019-01-20 12:21:29: setting up ppgrid matrices .. 

.....................................................................................................
.....................................................................................................
JaBbA 2019-01-20 12:21:31: Built gGraph with 592 nodes, 570 edges, purity 0.95, and ploidy 3.82
JaBbA 2019-01-20 12:21:34: Brand new function for reciprocal junctions calling.
.
JaBbA 2019-01-20 12:21:34: Excluding 0 aberrant junctions whose both breakpoints are in NA coverage regions
JaBbA 2019-01-20 12:21:34: Cancel nudge for 12 aberrant junctions where one of the 2 breakpoint is in NA coverage regions
JaBbA 2019-01-20 12:21:34: In sum, we are forcing 0 junctions, excluding 0 junctions, and nudging 0 junctions
JaBbA 2019-01-20 12:21:35: Creating ILOG CPLEX PARAMETER FILE in /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/JaBbA_simplified/jabba.raw.rds.prm
JaBbA 2019-01-20 12:21:36: Adjusting the kag (naive solution) as mipstart (initial solution).
JaBbA 2019-01-20 12:21:36: Applying mipstarts from previous jabba solution
JaBbA 2019-01-20 12:21:36: Fixing 106 nodes that are unmovable by slack 
JaBbA 2019-01-20 12:21:36: Partitioned graph into 78 connected components with the size of the highest 10 components being:
104,50,46,38,32,26,14,14,14,10
JaBbA 2019-01-20 12:21:37: Junction balancing subgraph 1 of 78 which has 102 nodes comprising 9.77 MB and 1 chromosomes, including chrs 19
JaBbA 2019-01-20 12:21:37: Starting initial run for subgraph 1, aiming epgap at 1e-04, within time limit of 360
JaBbA 2019-01-20 12:21:37: Applying L0 slack penalty
JaBbA 2019-01-20 12:21:37: Total mass on cn portion of objective function: 29639.027491637. Total mass on edge slack: 102000
JaBbA 2019-01-20 12:21:37: Running CPLEX with relative optimality gap threshold 1e-04
Error in names(x) <- c("xopt", "obj", "status", "extra", "epgap") : 
  'names' attribute [5] must be the same length as the vector [1]
Calls: suppressPackageStartupMessages ... FUN -> do.call -> jbaMIP -> Rcplex2 -> .canonicalize
In addition: There were 18 warnings (use warnings() to see them)
Error in .C("Rcplex_free", PACKAGE = "JaBbA") : 
  "Rcplex_free" not available for .C() for package "JaBbA"
Calls: suppressPackageStartupMessages ... lapply -> FUN -> do.call -> jbaMIP -> Rcplex2 -> .C
Execution halted
mskilab commented 5 years ago

Hi Peter - thanks for pointing this out. indeed jba had drifted away from the updated JaBbA arguments ... please try now

I also updated so that empty DEFAULT_GENOME or DEFAULT_BSGENOME environment variable doesn't break ... it just infers from the provided vcf and coverage files.

If you want to set these variables to custom values - use a path or URL to a UCSC style .chrom.sizes file or related format

Let me know if that fixes

On Sun, Jan 20, 2019 at 11:27 AM Peter Waltman notifications@github.com wrote:

In an attempt to fix the issue, I modified the jba rscript so that it set opt$abu and opt$ra to opt$coverage and opt$junctions, respectively, but now, I'm getting a different error - that the junctions file doesn't contain any junctions (see below). However, I checked, and the test junctions file definitely contains breakpoints, so I'm at a bit of a loss at this point.

waltman@waltman-workstation:~/modules/JaBbA_simplified$ JaBbA/jba JaBbA/inst/extdata/junctions.vcf JaBbA/inst/extdata/coverage.txt


( ) ( \ ( ) ( _ ) | | _ _ | (_) )| |_ | (_) | _ | | /'_ )| <'| '\ | _ | ( )_| |( (_| || (_) )| |_) )| | | | _/'`_,)(_/'(,/'() ()

(Junction Balance Analysis)

JaBbA 2019-01-20 11:20:30: Located junction file JaBbA/inst/extdata/junctions.vcf JaBbA 2019-01-20 11:20:30: Located coverage file JaBbA/inst/extdata/coverage.txt JaBbA 2019-01-20 11:20:30: Loading packages ... JaBbA 2019-01-20 11:20:31: Starting analysis in /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/JaBbA_simplified JaBbA 2019-01-20 11:20:31: Read in 0 total junctions JaBbA 2019-01-20 11:20:31: Empty junction input. Will do just one round of optimization. JaBbA 2019-01-20 11:20:31: Importing seg from UCSC format Error: scanVcf: 'indexname' must be character(1) path: JaBbA/inst/extdata/junctions.vcf In addition: There were 16 warnings (use warnings() to see them) Execution halted

my fix:

opt = NULL;
if (!is.null(args)) {
    opt = args$options
    opt$junctions = args$args[1]
    opt$coverage = args$args[2]
    if (!file.exists(opt$junctions))
    {
        message('Did not find junction file ', opt$junctions)
        print_help(parseobj); stop()
    }
    if (!file.exists(opt$coverage))
    {
        message('Did not find coverage file ', opt$coverage)
        print_help(parseobj); stop()
    }

    if ( is.null( opt$abu ) ) opt$abu <- opt$coverage
    if ( is.null( opt$ra ) ) opt$abu <- opt$junctions
} else {
    stop("See usage.")
}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mskilab/JaBbA/issues/18#issuecomment-455880557, or mute the thread https://github.com/notifications/unsubscribe-auth/AL-0kWMoCItxtE8b70O2xC8O9kBPWeceks5vFJj4gaJpZM4aJt5H .

xtYao commented 5 years ago

@pwaltman Hi Peter, I've cleaned up the argument list in the jba script. Please try again and the names should be slightly more intuitive now. Thanks for bringing this up.

pwaltman commented 5 years ago

Great, I'll try the new one shortly, but there was one more fix that I had to make to get it running. the vcf(s) that I'm passing in come from svaba, so they use the bam file as the sample identifier in the vcf, and this causes the call to JaBbA() to fail because the default argument for the sample name (opt$name) is set 'tumor'. I've come up with a workaround of editting the jba rscript to allow a 3rd argument to be passed to jba, which is the bam/sample name; and it seems to be working, for the moment at least (it's currently doing the CBS calculation).

KathyCat1 commented 1 year ago

module load R/4.2.1 jba ~/JaBbA-master/inst/extdata/junctions.vcf ~/JaBbA-master/inst/extdata/coverage.txt I still got the same error when I just run test data, how should fix it? Thanks!
Starting optimization with CPLEX! Error in names(x) <- c("xopt", "obj", "status", "extra", "epgap") : 'names' attribute [5] must be the same length as the vector [1] Calls: suppressPackageStartupMessages ... ramip_stub -> jbaLP -> balance -> -> .canonicalize In addition: There were 17 warnings (use warnings() to see them) Error in .C("Rcplex_free", PACKAGE = "gGnome") : "Rcplex_free" not available for .C() for package "gGnome" Calls: suppressPackageStartupMessages ... ramip_stub -> jbaLP -> balance -> -> .C Execution halted