tidyomics / plyranges

A grammar of genomic data transformation
https://tidyomics.github.io/plyranges/
140 stars 18 forks source link

Feature request "pull" function #94

Open eggrandio opened 2 years ago

eggrandio commented 2 years ago

Hello,

I wonder if there is a "tidy" way of retrieving metadata from grange objects. I haven't found an equivalent pull function.

Right now I find it quite cumbersome to filter and retrieve another metadata column as a vector.

I am doing this, but maybe there is a cleaner way (is there another way of converting from S4 to vector?):

gr <- GRanges(seqnames = "chr1", strand = c("+", "-", "+"),
              ranges = IRanges(start = c(1,3,5), width = 3)) %>% 
  mutate(score = c(0.1, 0.5, 0.3),
         peak = c("a", "b", "c"))

output <- gr %>% filter(score > 0.2) %>% select(peak, .drop_ranges = TRUE) %>% as.data.frame() %>% unlist() %>% unname()

Having a pull function would simplify this.

snystrom commented 2 years ago

This works and is probably the easiest solution:

pull.GenomicRanges(x, ...){
   dplyr::pull(data.frame(x), ...)
}

I can't really think of any features that would be unique to GRanges objects since it's really just extracting a vector.

eggrandio commented 2 years ago

This works and is probably the easiest solution:

pull.GenomicRanges(x, ...){
   dplyr::pull(data.frame(x), ...)
}

I can't really think of any features that would be unique to GRanges objects since it's really just extracting a vector.

Thanks! yes, that's probably the easiest way. I am still getting familiar with plyranges, I though some functions were just wrappers to ease the use of genomicRanges.

For example, I was expecting to have a rename function to rename metadata column names. Is there a tidy way of doing it? Currently I use

%>% `mcols<-`(value = list("new_name" = .$old_name))

but that could be simplified, and probably is not very efficient.

lawremi commented 2 years ago

A pull() method for DataFrame would solve the original issue.

For renaming, you could use rename() on the DataFrame returned by mcols(). Then you just need a pipeable way to set the mcols, like create a set_mcols() function that just calls mcols<-().

Perhaps @sa-lee could weigh in.