Open alexyfyf opened 5 years ago
Hi there, I'm not sure exactly what a full join would look like? Could you provide a simple example? If it makes sense you are more than welcome to make a pull request. Look at the code for https://github.com/sa-lee/plyranges/blob/master/R/ranges-overlap-joins-outer.R and https://github.com/sa-lee/plyranges/blob/master/R/ranges-overlap-find.R to see how I've constructed join_overlap_left()
Hi there, I'm not sure exactly what a full join would look like? Could you provide a simple example? If it makes sense you are more than welcome to make a pull request. Look at the code for https://github.com/sa-lee/plyranges/blob/master/R/ranges-overlap-joins-outer.R and https://github.com/sa-lee/plyranges/blob/master/R/ranges-overlap-find.R to see how I've constructed
join_overlap_left()
Thank you for your reply. I'll have a look and try to make a pull request. The example would like this. If there's any function from your package could do this, please let me know.
gr1 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 3, 4)),
IRanges(1:10, width=5),
strand='-',
score=101:110, GC=runif(10))
gr2 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 4, 3)),
IRanges(6:15, width=10),
strand='-',
score=21:30)
join_overlap_full(gr1,gr2)
The output will be like: gr1
GRanges object with 10 ranges and 2 metadata columns:
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr1 1-5 - | 101 0.930739698465914
[2] chr1 2-6 - | 102 0.566913234768435
[3] chr1 3-7 - | 103 0.682077450212091
[4] chr2 4-8 - | 104 0.889251226792112
[5] chr2 5-9 - | 105 0.0200071565341204
[6] chr2 6-10 - | 106 0.208122601732612
[7] chr3 7-11 - | 107 0.0573086701333523
[8] chr3 8-12 - | 108 0.578096957644448
[9] chr3 9-13 - | 109 0.802684629801661
[10] chr3 10-14 - | 110 0.71569463587366
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
gr2
GRanges object with 10 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 6-15 - | 21
[2] chr1 7-16 - | 22
[3] chr1 8-17 - | 23
[4] chr2 9-18 - | 24
[5] chr2 10-19 - | 25
[6] chr2 11-20 - | 26
[7] chr2 12-21 - | 27
[8] chr3 13-22 - | 28
[9] chr3 14-23 - | 29
[10] chr3 15-24 - | 30
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
The full_joined
GRanges object with 17 ranges and 3 metadata columns:
seqnames ranges strand | score.y score.x GC.x
<Rle> <IRanges> <Rle> | <integer> <integer> <numeric>
[1] chr1 2-6 - | 21 102 0.566913234768435
[2] chr1 3-7 - | 21 103 0.682077450212091
[3] chr1 3-7 - | 22 103 0.682077450212091
[4] chr2 5-9 - | 24 105 0.0200071565341204
[5] chr2 6-10 - | 24 106 0.208122601732612
... ... ... ... . ... ... ...
[13] chr3 8-12 - | <NA> 108 0.578096957644448
[14] chr1 8-17 - | 23 <NA> <NA>
[15] chr2 11-20 - | 26 <NA> <NA>
[16] chr2 12-21 - | 27 <NA> <NA>
[17] chr3 15-24 - | 30 <NA> <NA>
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
Hi, I was wondering if you've made progress on this? I'd really like to be able to do full joins in plyranges!
Hi, no sorry! I'm pretty busy with other projects at the moment, but will see if I have time over the next couple of months to implement it.
No problem! Thanks so much for creating plyranges in the first place. It's an awesome package, I use it all the time.
I use this function to full_join GRanges, might not fit in everyone's case (or maybe everyone has a different definition full_join). I'm sure someone can make it better, so I would like to share here.
join_overlap_full <- function(x,y){
options(stringsAsFactors = F)
hit <- findOverlaps(x,y)
colnames(mcols(x)) <- paste0(colnames(mcols(x)),".x")
colnames(mcols(y)) <- paste0(colnames(mcols(y)),".y")
cns.x <- mcols(x) %>% colnames()
cns.y <- mcols(y) %>% colnames()
z <- x[queryHits(hit),]
hitz <- cbind(mcols(y)[subjectHits(hit),],
mcols(x)[queryHits(hit),])
colnames(hitz) <- c(cns.y,cns.x)
mcols(z) <- hitz
uniqex <- x[-queryHits(hit),]
uniqey <- y[-subjectHits(hit),]
mcols(uniqex)[, (length(cns.x)+1):(length(cns.x)+length(cns.y))] <- NA
colnames(mcols(uniqex)) <- c(cns.x,cns.y)
mcols(uniqex) <- mcols(uniqex)[,c((length(cns.x)+1):(length(cns.x)+length(cns.y)),1:(length(cns.x)))]
mcols(uniqey)[, (length(cns.y)+1):(length(cns.x)+length(cns.y))] <- NA
colnames(mcols(uniqey)) <- c(cns.y,cns.x)
res <- c(z, uniqex,uniqey)
mcols(res) <- mcols(res) %>% data.frame() %>% mutate_all(as.character())
return(res)
}
This looks like it's on the right track! Take a look at https://github.com/sa-lee/plyranges/blob/ce998131510d1da7b3c9e47deba5946b5c33ab1d/R/ranges-overlap-joins-outer.R#L55-L86 which provides the template for doing the left outer join. You're basically adding an extra step to pull in the right hand side, i.e. only_right[subjectHits(hits)] <- FALSE
. Happy for you to be added as a ctb if you want to put this in as PR with some unit tests :D
Hi @sa-lee , I will have a try.
So this overlaop_joins_outer
is actually the function join_overlap_left
?
I didn't see any other function in this wrapper. Do you think I should add my join_overlap_full
on top of this?
Sounds a bit confusing.
@sa-lee Have you resolved this issue? I don't understand if the function provided by @alexyfyf is valid or not.
At least it is valid in the test case, again, I am not sure if everyone has the same definition of full_join GRanges.
This is a fantastic package, and it's one of the critical dependencies for my own package in CRAN. Having a full_join function would be good addition, and round out the 'join' operations.
The previous function doesn't sort out the metadata right, tacking on .x and .y labels, even if the metadata column name is unique to one group.
The other way you could do a full join for objects GR1 and GR2 is as follows: uniqueGR2 <- filter_by_non_overlaps(GR2, GR1) overlapGR<- join_overlap_left(GR1, GR2) bind_ranges(overlapGR, uniqueGR2)
This works once I've essentially pre-figured out which metadata columns I want to keep from each GRanges. Another way would be to add the ability to merge metadata if the column names are the same (i.e. sum, list, paste, etc..), or just add the .x and .y. Thanks!
Hi All,
I've been working on combing GRanges from several datasets, and wish to have a function from your package that works similar to the tidyverse
full_join
which includes any extry from at least one Granges object. Also, metadata full_joined.I have written a simple function to use by myself and worked on my dataset (GRanges with meta data). But I found it hard to incorporate to your packages because I think yours works on a range of *Ranges objects. Anyway, I would like to help and share it if anyone is interested to develop such a function.