mskilab-org / JaBbA

MIP based joint inference of copy number and rearrangement state in cancer whole genome sequence data.
MIT License
56 stars 26 forks source link

fragCounter automatically strips out 'chr' from the chromosome names in the final cov.rds output #21

Closed pwaltman closed 5 years ago

pwaltman commented 5 years ago

Not sure if I should put this here or in the flows repo, but fragCounter automatically strips out 'chr' from the chromosome names of the regions in the final cov.rds file that it produces. This causes issues if one aligns their samples to Broad's hg38 genome, which does include the 'chr' in the names - as the vcf's produced by any breakpoint caller will use those chr-prefixed chromosomes.

As a result, JaBbA strips out all of the breakpoints because it 'thinks' that they fall in regions with NA coverage. If you're going to standardize on using the non-'chr'-prefixed chromosome names, maybe JaBbA should automatically make that adjustment to the the breakpoints it reads in (?). Since it looks like the Broad has no standardized on using hg38 for all of their tools in GATK v.4.x, this will increasingly be an issue.

xtYao commented 5 years ago

Yes, this is a known issue. It originates from the previous default behaviour of gUtils::hg_seqlengths(). We've changed that so now it will respect the actual seqnames of the GRanges object, e.g. if your input coverage has "chr", it will keep it. Please check if your gUtils::hg_seqlengths default parameter value for chr is TRUE.

Meanwhile, Trent in the group is gonna publish a formal R package fragCounter. I'll redirect this issue to that repo.

pwaltman commented 5 years ago

Ok, the issue is that fragCount has already stripped the 'chr' from the names. I guess I can re-add them back in, and re-save the rds files - although that's a pain in the neck.