rafalab / bumphunter

bumphunter
15 stars 14 forks source link

Error in `matchGenes` when selecting subject index? #14

Open JonBarenboim opened 7 years ago

JonBarenboim commented 7 years ago

matchGenes selects the index of the matched gene in the subject as so:

if (type == "fiveprime") {
        map <- nearest(x, resize(subject, width = 1))
}
else {
    map <- nearest(x, subject)
}
ind <- which(!is.na(map))
[...]
for (j in ind) {
     i <- map[ind][j]

The last line (first line of the for loop) should, if I'm not mistaken, be i <- map[j] since j is already the correct index of the map

The current code causes an error if nearest fails to map some regions and map contains NAs:

> library(GenomicFeatures)
> library(bumphunter)
> dmrs <- read.csv(path/to/file)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> annotations <- annotateTranscripts(txdb)
> map <- nearest(makeGRangesFromDataFrame(dmrs), annotations)
> which(is.na(map))[1:10]
[1]  729 1379 1624 1927 1973 1986 1987 2031 2099 2111
> a <- matchGenes(dmrs[700:725,], annotations)
> b <- matchGenes(dmrs[725:730,], annotations)
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript contains NAs or out-of-bounds indices
9: stop("subscript contains NAs or out-of-bounds indices")
8: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
7: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
6: normalizeSingleBracketSubscript(i, x)
5: extractROWS(x, i)
4: extractROWS(x, i)
3: subject[i, ]
2: subject[i, ]
1: matchGenes(dmrs[725:730, ], annotations)
> ind <- which(!is.na(map))
> for (j in ind[725:730]) print(map[ind][j])
[1] 73156
[1] 17050
[1] 21892
[1] 51693
[1] 37926
[1] 19178
> for(j in ind[725:730]) print(map[j])
[1] 73156
[1] 17050
[1] 21892
[1] 51693
[1] 53380
[1] 37926
> map[725:731]
[1] 73156 17050 21892 51693    NA 53380 37926
lcolladotor commented 7 years ago

Hi @JonBarenboim,

Thanks to @sampoll, this repo is back in sync with Bioconductor. For https://github.com/rafalab/bumphunter/pull/17 I did some changes to matchGenes() for other reasons and I looked into this issue too to see if I could address it as well but I couldn't. Basically, if you provide a working example I think that @sampoll will be able to go through it. If your working example is small enough, it could be added as a unit test to the package.

You could be right about i <- map[ind][j] but it's hard to double check without having access to dmrs <- read.csv(path/to/file) (or a dummy file).

Best, Leo