mskilab-org / fragCounter

GC and mappability corrected fragment coverage for paired end whole genome sequencing
MIT License
7 stars 11 forks source link

Keep "chr" notation #23

Closed l0ka closed 1 year ago

l0ka commented 1 year ago

Hi, I get an error when running fragCounter: Presegmenting at NA bp scale and it stops.

My BAMs/CRAMs have the chromosomes with the "chr" notation. The problem is in this piece of code, within the correctcov_stub() function:

map = gr.sub(map)  # <- removes the "chr" even if the original map object has that notation
gc  = gr.sub(gc)   # <- removes the "chr" even if the original gc object has that notation

gc.str  = gr.string(gc)  ## without "chr"
map.str = gr.string(map) ## without "chr"
cov.str = gr.string(cov) ## has "chr"

Then:

all.str = intersect(intersect(map.str, cov.str), gc.str) obviously results in character(0) and therefore also the other objects are empty:

>   map = map[match(all.str, map.str)]
>   cov = cov[match(all.str, cov.str)]
>   gc  = gc[match(all.str, gc.str)]

> map
GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |     score
      <Rle> <IRanges>  <Rle> | <numeric>
  -------
  seqinfo: 24 sequences from an unspecified genome

> cov
GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |     score
      <Rle> <IRanges>  <Rle> | <numeric>
  -------
  seqinfo: 3366 sequences from an unspecified genome

> gc
GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |     score
      <Rle> <IRanges>  <Rle> | <numeric>
  -------
  seqinfo: 455 sequences from an unspecified genome

Removing the "chr" also from the cov object (in correctcov_stub()) solves the issue:

map = gr.sub(map)
gc  = gr.sub(gc)
cov = gr.sub(cov)  # Remove "chr" 

but it would be nice to be able to choose whether to keep or not the "chr" notation. Thank you in advance.

zining01 commented 1 year ago

Hi, thanks for pointing out this issue! I pushed a fix to the dev branch: https://github.com/mskilab/fragCounter/tree/dev

If running from command line, you should be able to retain the chr prefix using the flag --chr.sub FALSE. If you have a chance to give it a try please let us know how it goes!

pblaney commented 1 year ago

Hello @zining01,

I recently attempted using fragCounter on an hg38 aligned BAM while use the --chrsub FALSE parameter. While there were not issues in execution, I did notice my cov.rds and cov.corrected.bw both lack the 'chr' prefix.

I started to work back through the code based on the final conditional at line 665:

  if (has.chr && !chr.sub) { cov = gUtils::gr.chr(cov) }

Everything seems to checkout but then I noticed in the frag executable, that the new parameter --chrsub is not assigned when passed to the fragCounter() function:

out = fragCounter(bam = opt$bam, cov = opt$cov, window = opt$window, gc.rds.dir = opt$gcmapdir, map.rds.dir = opt$gcmapdir, minmapq = opt$minmapq, paired = opt$paired, outdir = opt$outdir, exome = opt$exome, use.skel = opt$use.skel, skeleton = opt$skeleton, opt$chrsub)

I think the parameter should work fine but given the default of this being #' @param chr.sub boolean if TRUE, remove 'chr' prefix on seqnames. default TRUE in the source of fragCounter(), the user-provided value was ignored.

I can give it a test to be sure. What do you think?

Patrick

zining01 commented 1 year ago

Hi Patrick,

I think you're right - sorry about the oversight and thanks for helping us debug. I updated the frag executable on the dev branch again... please let me know if this fixes your issue!

pblaney commented 1 year ago

Hello Zi-Ning,

Thank you for getting to this so quickly. I can confirm the changes worked. Both the output RDS and bigWig file have the appropriate 'chr' notation in the output when providing --chrsub FALSE argument from the command line.

I think it's safe to consider this issue closed.

Thanks again, Patrick

zining01 commented 1 year ago

Great, thanks again for your help with this.

l0ka commented 1 year ago

Thank you Zi-Ning!