Closed snystrom closed 4 years ago
Alternatively, it just occurred to me that what would be ideal is for the join_nearest_*
family of functions, it would be amazing to have a distance
flag that added the distance from the anchor point to the joined range.
Edit: I am also comfortable trying to implement this approach, but feedback before I start would probably be a good thing.
Hi!
I agree that this would be a cool feature to have! Two thoughts on how to proceed, I think both would be useful.
add an argument to join_nearest
to include a distance column in the output, with FALSE as the default, I think if you look at the current implementation this would just swap out the function to generate Hits, as I'm fairly sure distanceToNeraest and nearest are equivalent.
Add a function called add_nearest_distance
that just adds the distance as an mcols on the query. Similar in design to add_count
from dplyr. Reminds me that add_overlap_count
would be useful as well.
Happy to review any PR if you're keen to have a go!
Sounds good. I've got something working and will PR soon. Quick question about defaults. Currently, my implementation of add_nearest_distance
uses the default behavior of distanceToNearest
which has ignore.strand == FALSE
as default. I wonder if it might cause confusion since join_nearest()
uses ignore.strand == TRUE
. Maybe this is just handled by good documentation, but if you had thoughts on what may integrate best with the rest of the stack, I'm open to suggestions.
Thanks for the PR!
Our default is always ignore.strand = TRUE
but we don't include this as arguments to a function. Instead we add functions for including strand with the directed suffix, so I would usually split this up so there would be an add_nearest_distance_directed
and add_nearest_distance
. I'll try and talk at look at this over the next couple of days :D
Oh, duh. I'll add those into the PR.
Often I want to compare the distance between two GRanges objects. I usually solve this problem using the GRanges function
distanceToNearest
then appending thedistance
mcol data to a new column of the subject hits. This is annoying to do inside aplyranges::mutate
call, because as far as I'm aware, it requires 2 steps.It would be nice to add a helper function to facilitate this, perhaps as below:
I'm happy to implement this and do a PR, but some thoughts on implementation or input on whether I'm forgetting an edge case or something would be nice first.