mskilab-org / fragCounter

GC and mappability corrected fragment coverage for paired end whole genome sequencing
MIT License
7 stars 11 forks source link

Error in multicoco #7

Closed alhafidzhamdan closed 3 years ago

alhafidzhamdan commented 3 years ago

Hi there, I used your gc1000.rds and map1000.rds files and liftover function to convert those to hg38 equivalent (same file names). I ran fragCounter using:

frag -b $ALIGNED_BAM_FILE_TUMOR -d $GCMAP -w 1000 -o ${FRAG_RESULT}/${TYPE}/${BATCH}/${PATIENT_ID}

And i got this error:

Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 3219322, length gc is 2889893, length map is 2889893
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 100 fold collapsed ranges
Presegmenting at  NA  bp scale 
Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start),  : 
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed.
Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table
In addition: Warning messages:
1: replacing previous import ‘GenomicRanges::shift’ by ‘data.table::shift’ when loading ‘bamUtils’ 
2: replacing previous import ‘GenomicAlignments::second’ by ‘data.table::second’ when loading ‘bamUtils’ 
3: replacing previous import ‘GenomicAlignments::last’ by ‘data.table::last’ when loading ‘bamUtils’ 
4: replacing previous import ‘GenomicAlignments::first’ by ‘data.table::first’ when loading ‘bamUtils’ 
5: In readChar(check_gz, 4) :
  text connection used with readChar(), results may be incorrect
6: In max(width(cov)) : no non-missing arguments to max; returning -Inf
7: In cat("Presegmenting at ", as.integer(WID * base^(numlevs)), " bp scale \n") :
  NAs introduced by coercion to integer range
Execution halted

I infer that Presegmenting at NA bp scale probably can be solved by changing as.integer to as.numeric in

cat('Presegmenting at ', as.integer(WID*base^(numlevs)), ' bp scale \n')

But i'm not sure about the error after it, ie Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start), : 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed. Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table

Any clues to this?

A

xtYao commented 3 years ago

Thanks a lot for raising this issue. I am trying to replicate it.

alhafidzhamdan commented 3 years ago

Hi @xtYao ,any luck so far? Let me know if I can help in any way! Very interested in trying Jabba and this is one of the bottlenecks for me. Much appreeciated.

xtYao commented 3 years ago

Hi there, I used your gc1000.rds and map1000.rds files and liftover function to convert those to hg38 equivalent (same file names). I ran fragCounter using:

frag -b $ALIGNED_BAM_FILE_TUMOR -d $GCMAP -w 1000 -o ${FRAG_RESULT}/${TYPE}/${BATCH}/${PATIENT_ID}

And i got this error:

Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 3219322, length gc is 2889893, length map is 2889893
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 100 fold collapsed ranges
Presegmenting at  NA  bp scale 
Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start),  : 
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed.
Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table
In addition: Warning messages:
1: replacing previous import ‘GenomicRanges::shift’ by ‘data.table::shift’ when loading ‘bamUtils’ 
2: replacing previous import ‘GenomicAlignments::second’ by ‘data.table::second’ when loading ‘bamUtils’ 
3: replacing previous import ‘GenomicAlignments::last’ by ‘data.table::last’ when loading ‘bamUtils’ 
4: replacing previous import ‘GenomicAlignments::first’ by ‘data.table::first’ when loading ‘bamUtils’ 
5: In readChar(check_gz, 4) :
  text connection used with readChar(), results may be incorrect
6: In max(width(cov)) : no non-missing arguments to max; returning -Inf
7: In cat("Presegmenting at ", as.integer(WID * base^(numlevs)), " bp scale \n") :
  NAs introduced by coercion to integer range
Execution halted

I infer that Presegmenting at NA bp scale probably can be solved by changing as.integer to as.numeric in

cat('Presegmenting at ', as.integer(WID*base^(numlevs)), ' bp scale \n')

But i'm not sure about the error after it, ie Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start), : 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed. Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table

Any clues to this?

A

Hi,

So sorry for the delay.

I think we ran into this problem: https://stackoverflow.com/questions/3252057/data-table-and-by-must-evaluate-to-list-error

At line https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L97 We try to aggregate by a column whose name is constructed within the expression, which is a character object and your data.table doesn't like that since it's not a list.

I tried wrapping the column names by which to aggregate with list() and it also works for me just like without. So I am wondering what is the cause of this discrepancy we are seeing between yours and mine. I am using R 4.0.2 and data.table 1.13.2.

Xiaotong

alhafidzhamdan commented 3 years ago

Hi there- thank you for getting back. I am not sure what's happening here but i've used gc corrected calls from PURPLE with JaBbA and seems to work OK. A

zlskidmore commented 2 years ago

I just wanted to chime in, I'm seeing the same error using some test data I have:

Presegmenting at  NA  bp scale 
Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start),  : 
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed.
Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table
In addition: Warning messages:

I did go ahead and re-build/re-install fragCounter after modifying lines https://github.com/mskilab/fragCounter/blob/d2be44d2e5562b03e4b967d8fc9fdc0ab79a5497/R/fragCounter.R#L98 https://github.com/mskilab/fragCounter/blob/d2be44d2e5562b03e4b967d8fc9fdc0ab79a5497/R/fragCounter.R#L720

to be

tmp.cov = seg2gr(cov.dt[,list(chr = seqnames[1], start = min(start), end = max(end), strand = strand[1], reads = mean(reads, na.rm = T)), by = list(get(paste("lev", numlevs, sep = '')))][end>start, ], seqlengths = sl)

as suggested by @xtYao

After that modification I get the error

Presegmenting at  NA  bp scale
Error: subscript contains invalid names
In addition: Warning messages:
1: replacing previous import 'GenomicRanges::shift' by 'data.table::shift' when loading 'bamUtils'
2: replacing previous import 'GenomicAlignments::second' by 'data.table::second' when loading 'bamUtils'
3: replacing previous import 'GenomicAlignments::last' by 'data.table::last' when loading 'bamUtils'
4: replacing previous import 'GenomicAlignments::first' by 'data.table::first' when loading 'bamUtils'
5: closing unused connection 3 (/storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam)

Note that everything appears to run successfully on the test data within the package, just some warnings

frag -b extdata/chr21.bam -d extdata/gcMAP21/ -w 200
Converting to GRanges
Made GRanges
Warning messages:
1: replacing previous import 'GenomicRanges::shift' by 'data.table::shift' when loading 'bamUtils'
2: replacing previous import 'GenomicAlignments::second' by 'data.table::second' when loading 'bamUtils'
3: replacing previous import 'GenomicAlignments::last' by 'data.table::last' when loading 'bamUtils'
4: replacing previous import 'GenomicAlignments::first' by 'data.table::first' when loading 'bamUtils'
5: closing unused connection 3 (extdata/chr21.bam)
6: In value[[3L]](cond) : DNACopy error moving on without segmenting
7: In FUN(as.data.frame(grs[[i]]), fields, seg = seg) :
  Not enough samples for loess fitting - check to see if missing or truncated data?

I've a docker image i've been using to test that is here: https://hub.docker.com/repository/docker/zlskidmore/fragcounter

Any ideas on what is going on, i'm trying to run jabba eventually, I don't quite understand why the internal package test data works on mine doesn't, other than it must have something to do with the .rds files?

xtYao commented 2 years ago

Hi Zachary,

Looking at the output message, I think my command there to identify the bin width of your input coverage data did not work as intended, since it gave “NA bp”.

Can you send us maybe an excerpt of your coverage input file?

Cheers, Xiaotong

Get Outlook for iOShttps://aka.ms/o0ukef


From: Zachary Skidmore @.> Sent: Tuesday, July 20, 2021 12:19:27 PM To: mskilab/fragCounter @.> Cc: Xiaotong Yao @.>; Mention @.> Subject: Re: [mskilab/fragCounter] Error in multicoco (#7)

I just wanted to chime in, I'm seeing the same error using some test data I have:

Presegmenting at NA bp scale Error in [.data.table(cov.dt, , list(chr = seqnames[1], start = min(start), : 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed. Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table In addition: Warning messages:

I did go ahead and re-build/re-install fragCounter after modifying lines https://github.com/mskilab/fragCounter/blob/d2be44d2e5562b03e4b967d8fc9fdc0ab79a5497/R/fragCounter.R#L98 https://github.com/mskilab/fragCounter/blob/d2be44d2e5562b03e4b967d8fc9fdc0ab79a5497/R/fragCounter.R#L720

to be

tmp.cov = seg2gr(cov.dt[,list(chr = seqnames[1], start = min(start), end = max(end), strand = strand[1], reads = mean(reads, na.rm = T)), by = list(get(paste("lev", numlevs, sep = '')))][end>start, ], seqlengths = sl)

as suggested by @xtYaohttps://github.com/xtYao

After that modification I get the error

Presegmenting at NA bp scale Error: subscript contains invalid names In addition: Warning messages: 1: replacing previous import 'GenomicRanges::shift' by 'data.table::shift' when loading 'bamUtils' 2: replacing previous import 'GenomicAlignments::second' by 'data.table::second' when loading 'bamUtils' 3: replacing previous import 'GenomicAlignments::last' by 'data.table::last' when loading 'bamUtils' 4: replacing previous import 'GenomicAlignments::first' by 'data.table::first' when loading 'bamUtils' 5: closing unused connection 3 (/storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam)

Note that everything appears to run successfully on the test data within the package, just some warnings

frag -b extdata/chr21.bam -d extdata/gcMAP21/ -w 200

Converting to GRanges Made GRanges Warning messages: 1: replacing previous import 'GenomicRanges::shift' by 'data.table::shift' when loading 'bamUtils' 2: replacing previous import 'GenomicAlignments::second' by 'data.table::second' when loading 'bamUtils' 3: replacing previous import 'GenomicAlignments::last' by 'data.table::last' when loading 'bamUtils' 4: replacing previous import 'GenomicAlignments::first' by 'data.table::first' when loading 'bamUtils' 5: closing unused connection 3 (extdata/chr21.bam) 6: In value[3L] : DNACopy error moving on without segmenting 7: In FUN(as.data.frame(grs[[i]]), fields, seg = seg) : Not enough samples for loess fitting - check to see if missing or truncated data?

I've a docker image i've been using to test that is here: https://hub.docker.com/repository/docker/zlskidmore/fragcounter

Any ideas on what is going on, i'm trying to run jabba eventually, I don't quite understand why the internal package test data works on mine doesn't, other than it must have something to do with the .rds files?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mskilab/fragCounter/issues/7#issuecomment-883522877, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVYZHMZL6BWYKLCVLSEGYTTYWOY7ANCNFSM4TBF2HJA.

zlskidmore commented 2 years ago

Sure!

My command is

frag -d /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/GRCh38/tmp/ -w 200 -b /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam -o /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA

the -d dir contains 2 files:

> gc200
GRanges object with 15441361 ranges and 1 metadata column:
             seqnames            ranges strand |     score
                <Rle>         <IRanges>  <Rle> | <numeric>
         [1]     chr1             1-200      * |         0
         [2]     chr1           201-400      * |         0
         [3]     chr1           401-600      * |         0
         [4]     chr1           601-800      * |         0
         [5]     chr1          801-1000      * |         0
         ...      ...               ...    ... .       ...
  [15441357]     chrY 57226601-57226800      * |         0
  [15441358]     chrY 57226801-57227000      * |         0
  [15441359]     chrY 57227001-57227200      * |         0
  [15441360]     chrY 57227201-57227400      * |         0
  [15441361]     chrY 57227401-57227415      * |         0
  -------
  seqinfo: 455 sequences from an unspecified genome
> map200
GRanges object with 15441361 ranges and 1 metadata column:
             seqnames            ranges strand |     score
                <Rle>         <IRanges>  <Rle> | <numeric>
         [1]     chr1             1-200      * |         0
         [2]     chr1           201-400      * |         0
         [3]     chr1           401-600      * |         0
         [4]     chr1           601-800      * |         0
         [5]     chr1          801-1000      * |         0
         ...      ...               ...    ... .       ...
  [15441357]     chrY 57226601-57226800      * |         0
  [15441358]     chrY 57226801-57227000      * |         0
  [15441359]     chrY 57227001-57227200      * |         0
  [15441360]     chrY 57227201-57227400      * |         0
  [15441361]     chrY 57227401-57227415      * |         0
  -------
  seqinfo: 24 sequences from an unspecified genome

and the bam file excerpt is (sequence are replaced with just A here):

A00580:129:HVV2JDSXX:3:1571:20202:15562 99  chr22   10510012    0   151M    =   10510181    320 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? MC:Z:151M   PG:Z:MarkDuplicates MQ:i:0AS:i:151  XS:i:151    MD:Z:151    NM:i:0  RG:Z:2901188444
A00580:129:HVV2JDSXX:3:2360:28067:14340 99  chr22   10510033    0   151M    =   10510220    341 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  MC:Z:124M3D27M  PG:Z:MarkDuplicates MQ:i:0AS:i:151  XS:i:151    MD:Z:151    NM:i:0  RG:Z:2901188444
A00580:129:HVV2JDSXX:2:1611:14308:25770 99  chr22   10510041    0   151M    =   10510177    287 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ??????????????????????????????????????????????????????????????????+???????????????????????????????+???????????????????????????????????????????????????? MC:Z:151M   PG:Z:MarkDuplicates MQ:i:0AS:i:151  XS:i:151    MD:Z:151    NM:i:0  RG:Z:2901188572

I did not supply a -c is that what you meant by coverage data, or were you referring to the bam?

xtYao commented 2 years ago

Oops, sorry, this is fragCounter, you are generating the coverage. I mistakenly thought this was an issue in JaBbA.

Could you send me some more lines of the standard output preceding “Presegmenting… NA bp”?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Zachary Skidmore @.> Sent: Tuesday, July 20, 2021 12:48:56 PM To: mskilab/fragCounter @.> Cc: Xiaotong Yao @.>; Mention @.> Subject: Re: [mskilab/fragCounter] Error in multicoco (#7)

Sure!

My command is

frag -d /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/GRCh38/tmp/ -w 200 -b /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam -o /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA

the -d dir contains 2 files:

gc200 GRanges object with 15441361 ranges and 1 metadata column: seqnames ranges strand | score

| [1] chr1 1-200 * | 0 [2] chr1 201-400 * | 0 [3] chr1 401-600 * | 0 [4] chr1 601-800 * | 0 [5] chr1 801-1000 * | 0 ... ... ... ... . ... [15441357] chrY 57226601-57226800 * | 0 [15441358] chrY 57226801-57227000 * | 0 [15441359] chrY 57227001-57227200 * | 0 [15441360] chrY 57227201-57227400 * | 0 [15441361] chrY 57227401-57227415 * | 0 ------- seqinfo: 455 sequences from an unspecified genome map200 GRanges object with 15441361 ranges and 1 metadata column: seqnames ranges strand | score | [1] chr1 1-200 * | 0 [2] chr1 201-400 * | 0 [3] chr1 401-600 * | 0 [4] chr1 601-800 * | 0 [5] chr1 801-1000 * | 0 ... ... ... ... . ... [15441357] chrY 57226601-57226800 * | 0 [15441358] chrY 57226801-57227000 * | 0 [15441359] chrY 57227001-57227200 * | 0 [15441360] chrY 57227201-57227400 * | 0 [15441361] chrY 57227401-57227415 * | 0 ------- seqinfo: 24 sequences from an unspecified genome

and the bam file excerpt is (sequence are replaced with just A here):

A00580:129:HVV2JDSXX:3:1571:20202:15562 99 chr22 10510012 0 151M = 10510181 320 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? MC:Z:151M PG:Z:MarkDuplicates MQ:i:0AS:i:151 XS:i:151 MD:Z:151 NM:i:0 RG:Z:2901188444 A00580:129:HVV2JDSXX:3:2360:28067:14340 99 chr22 10510033 0 151M = 10510220 341 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? MC:Z:124M3D27M PG:Z:MarkDuplicates MQ:i:0AS:i:151 XS:i:151 MD:Z:151 NM:i:0 RG:Z:2901188444 A00580:129:HVV2JDSXX:2:1611:14308:25770 99 chr22 10510041 0 151M = 10510177 287 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ??????????????????????????????????????????????????????????????????+???????????????????????????????+???????????????????????????????????????????????????? MC:Z:151M PG:Z:MarkDuplicates MQ:i:0AS:i:151 XS:i:151 MD:Z:151 NM:i:0 RG:Z:2901188572

I did not supply a -c is that what you meant by coverage data, or were you referring to the bam?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mskilab/fragCounter/issues/7#issuecomment-883542058, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVYZHNUUHAA7OVEL55FHI3TYWSHRANCNFSM4TBF2HJA.

zlskidmore commented 2 years ago

No Problem:

Calling samtools view  -f 0x02 -F 0x10 /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam -q 1 | cut -f "3,4,9"
Starting fragment count on /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam with bin size 200 and min mapQ 1 and   insert size limit 10000 with midpoint set to TRUE
bam.cov.tile.st  /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam chunk 1 num fragments processed 1e+06
mean cov: 0.1 per bin, estimated tot fragments: 1.13 million fragments, processing 192035.6 fragments/second
time elapsed: 0 hours, estimated time remaining: 0 hours , estimated total time 0 hours
bam.cov.tile.st  /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam chunk 2 num fragments processed 2e+06
mean cov: 0.1 per bin, estimated tot fragments: 2.25 million fragments, processing 218662.7 fragments/second
time elapsed: 0 hours, estimated time remaining: 0 hours , estimated total time 0 hours
bam.cov.tile.st  /storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam chunk 3 num fragments processed 3e+06
mean cov: 0.2 per bin, estimated tot fragments: 3.36 million fragments, processing 229851.6 fragments/second
time elapsed: 0 hours, estimated time remaining: 0 hours , estimated total time 0 hours
Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 16088445, length gc is 15441361, length map is 15441361
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 500 fold collapsed ranges
Presegmenting at  NA  bp scale
Error: subscript contains invalid names
In addition: Warning messages:
1: replacing previous import 'GenomicRanges::shift' by 'data.table::shift' when loading 'bamUtils'
2: replacing previous import 'GenomicAlignments::second' by 'data.table::second' when loading 'bamUtils'
3: replacing previous import 'GenomicAlignments::last' by 'data.table::last' when loading 'bamUtils'
4: replacing previous import 'GenomicAlignments::first' by 'data.table::first' when loading 'bamUtils'
5: closing unused connection 3 (/storage1/fs1/rgovindan/Active/Analysis/WGS/bams/tumor/bedpe/fragCounter/SCLC32_R_NA/out.bam)
6: In bamUtils::bam.cov.tile(bam, window = window, reference = reference,  :
  strange data.table "by" fail .. using rbindlist lapply instead as alternative
7: In max(width(cov)) : no non-missing arguments to max; returning -Inf
8: In cat("Presegmenting at ", as.integer(WID * base^(numlevs)), " bp scale \n") :
  NAs introduced by coercion to integer range
9: In min(start) : no non-missing arguments to min; returning Inf
10: In max(end) : no non-missing arguments to max; returning -Inf
Execution halted
zlskidmore commented 2 years ago

That is with the addition of list to those two lines of code I mentioned above. I can remove that and give you the other error in full as well if that will help?

zlskidmore commented 2 years ago

Hi @xtYao, any insights on what the problem might be, or where the error may be originating from in the source code?

jamesdalg commented 2 years ago

I am also having this issue:

mean cov: 39.3 per bin, estimated tot fragments: 616 million fragments, processing 93033.34 fragments/second
time elapsed: 1.84 hours, estimated time remaining: 0 hours , estimated total time 1.84 hours
Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 15685849, length gc is 240650, length map is 240650
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 500 fold collapsed ranges
Presegmenting at  NA  bp scale
Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start),  :
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed.
Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table
In addition: Warning messages:
1: replacing previous import ‘GenomicRanges::shift’ by ‘data.table::shift’ when loading ‘bamUtils’
2: replacing previous import ‘GenomicAlignments::second’ by ‘data.table::second’ when loading ‘bamUtils’
3: replacing previous import ‘GenomicAlignments::last’ by ‘data.table::last’ when loading ‘bamUtils’
4: replacing previous import ‘GenomicAlignments::first’ by ‘data.table::first’ when loading ‘bamUtils’
5: closing unused connection 3 (../PAMHYN_Tumor.realigned.md.bam)
6: In max(width(cov)) : no non-missing arguments to max; returning -Inf
7: In cat("Presegmenting at ", as.integer(WID * base^(numlevs)), " bp scale \n") :
  NAs introduced by coercion to integer range

Looks like data.table broke something on an update. What version of data.table do you recommend to use?