Closed bobrad98 closed 1 year ago
We have not tested on hg38 alt. Do you have the error file from the R script?
ie. /tmp/ratioSegmentation.R4578502221624741866.error
The R script is pretty simple (see https://github.com/hartwigmedical/hmftools/blob/master/cobalt/src/main/resources/r/ratioSegmentation.R)
and just reads the ratiofile and pcffile (not the ref genome), so it must not like one of the chromosome outputs written by cobalt to these files.
Hi,
For now, unfortunately, I'm not able to get the error file from the script. When I get more info, I'll share it here.
I would recommend to try to run that script line by line using the ratiofile and pcffile as inputs and see where it errors out.
Otherwise, if you are happy to share those 2 files, either here or via p.priestley@hartwigmedicalfoundation.nl then i can do it for you.
I managed to see the state of the tool when the errors happen for the following command line input:
java -jar -Xmx8G cobalt-1.13.jar /
-reference NOR /
-reference_bam $NORMAL /
-tumor TUM /
-tumor_bam $TUMOR /
-gc_profile $GC /
-output_dir /sbgenomics/workspace/cobalt /
-threads 36
where $NORMAL and $TUMOR are paths to the respective files.
The output directory contains the following files:
so the error is coming from the execution of the ratioSegmentation script on the reference data.
The error file says:
Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union Loading required package: BiocGenerics Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:dplyr’: combine, intersect, setdiff, union The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min Warning: pcf is not run for sample 1 on chromosome arm because all observations are missing. NA is returned. Error in data.frame(rep(sampleid[i], nSeg), seg.chrom, seg.arm, pos.start, : arguments imply differing number of rows: 1, 0 Calls: pcf -> data.frame Execution halted
Hi there,
I took a look at the file you sent me (cobalt.ratio.tsv) and I noticed 2 issues with it
In this case, I used ref genome without the chr notation - the idea is to be able to use any ref genome file, regardless of the notation used. Does this mean that COBALT works only for the files with the chr notation?
All HMF tools require that GRCh37 has no 'chr' prefix and GRCh38 does have it.
thanks!
Hi,
When running COBALT with the alignment files which are aligned to the reference genome that contains ALT contigs, the program fails. To be more precise, for command line input
Stdout reports an error:
while the internal log says that the error is regarding the execution of the ratioSegmentation R script:
This doesn’t occur when running COBALT with the alignment files which are aligned to the reference genome that doesn’t contain ALT contigs. Following command line input finishes successfully (only difference is in the input files):
Why is this happening and how can I solve it? Thanks in advance!