jorainer / ensembldb

This is the ensembldb development repository.
https://jorainer.github.io/ensembldb
33 stars 10 forks source link

Some Granges with genomeToProtein: "Error: subscript contains NAs" #69

Closed ibwoo closed 6 years ago

ibwoo commented 6 years ago

Strange error here, I can't get to the bottom of this one either. The genomeToProtein function handles unmapped ranges by returning -1, and this works for me in most cases. I have a fair number of ranges, however, that work on their own and map to actual proteins, but when paired with certain other mapping ranges, will throw the error "subscript contains NAs".

I've tried to recreate this below. It's a messy recreation, but I'm pretty dumbfounded by the error and wanted to demonstrate all scenarios.

library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75

# Good range, works on its own
GRanges(Rle(c("chr19")),
        IRanges(51920103, width=1, names = "good1")) %>% genomeToProtein(db = edb)

# Pair of good ranges, work together
GRanges(Rle(c("chr19")),
        IRanges(c(51920103,51957556), width=1, names = c("good1","good2"))) %>% genomeToProtein(db = edb)

# Problematic range, but works on its own
GRanges(Rle(c("chr19")),
        IRanges(51919998, width=1, names = "bad1")) %>% genomeToProtein(db = edb)

# Problematic range with good1 range, throws error
GRanges(Rle(c("chr19")),
        IRanges(c(51920103,51919998), width=1, names = c("good1","bad1"))) %>% genomeToProtein(db = edb)

# Problematic range with good2 range, works fine together (!?)
GRanges(Rle(c("chr19")),
        IRanges(c(51957556,51919998), width=1, names = c("good2","bad1"))) %>% genomeToProtein(db = edb)

# Unmapping range, but handles failure with "-1"
GRanges(Rle(c("chr19")),
        IRanges(51919894, width=1, names = "notmapping1")) %>% genomeToProtein(db = edb)

# Unmapping range with problematic range, handles failure and works fine
GRanges(Rle(c("chr19")),
        IRanges(c(51919894,51919998), width=1, names = c("notmapping1","bad1"))) %>% genomeToProtein(db = edb)

I appreciate your support with this package, which has been very useful. Thanks

jorainer commented 6 years ago

Thanks for the code examples. I will look at it.

jorainer commented 6 years ago

Should be fixed now. I'll wait to push this to the Bioconductor git repository until the problems with Rsamtools (missing export of the path method) are fixed and I can run a full check.

You can install the fix using devtools::install_github("jotsetung/ensembldb")

Btw, I assume you are setting seqlevelsStyle(edb) <- "UCSC" in your code examples above.

ibwoo commented 6 years ago

That did the trick, it works well thank you