morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Fix error in gene_to_region function #198

Closed vladimirsouza closed 1 year ago

vladimirsouza commented 1 year ago

In this pull request, I fixed an error in the gene_to_region function.

Example of a code to reproduce the error:

library(GAMBLR)
library(tidyverse)
myc_region = gene_to_region(gene_symbol = "MYC", genome_build = "grch37", return_as = "region")

Error message:

Can't convert `y` <character> to match type of `x` <double>.

The code mutate_all(na_if,"") (see line inside the function) was changed to avoid incompatibilities between arguments x and y of function na_if.

Pull Request Checklists

Required

library(GAMBLR)
library(tidyverse)

myc_region = gene_to_region(gene_symbol = "MYC", genome_build = "grch37", return_as = "region")

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

Checklist for changes to existing code

myc_region = gene_to_region(gene_symbol = "MYC", genome_build = "grch37", return_as = "region")
myc_region
myc_cns = get_cn_segments(region = myc_region, 
                          projection = "grch37",
                          this_seq_type = "genome", 
                          streamlined = FALSE, 
                          from_flatfile = TRUE, 
                          with_chr_prefix = FALSE) %>% dplyr::filter(CN != 2)
head(myc_cns)
vladimirsouza commented 1 year ago

I used $GAMBLR_PATH/inst/perl/check_functions.pl and it found tons of dplyr functions to add the prefix dplyr::. I guess we are not doing that, right?

rdmorin commented 1 year ago

I used $GAMBLR_PATH/inst/perl/check_functions.pl and it found tons of dplyr functions to add the prefix dplyr::. I guess we are not doing that, right?

I would check with @mattssca . It may not be necessary as long as our functions now declare that they require/import dplyr explictly

rdmorin commented 1 year ago

Can you post the contents of myc_region that you get when you run the updated example?

mattssca commented 1 year ago

Thanks for the update Vladimir and congratulations on your first PR. I am not able to reproduce the error and the example runs fine in my case. I am wondering if this relates to the dplyr version. I am currently running version 1.0.8. Could you check what version you are using?

mattssca commented 1 year ago

Ryan is correct, we do not need to explicitly prefix any dplyr functions with dplyr::. With the exception of filter. This needs to be specified with dplyr::filter, since there are conflicting function names between different packages. Does that make sense?

mattssca commented 1 year ago

For reference, this is what I get when I run the example in my session:

> myc_region = gene_to_region(gene_symbol = "MYC", genome_build = "grch37", return_as = "region")
1 region(s) returned for 1 gene(s)
> myc_region
[1] "8:128747680-128753674"
mattssca commented 1 year ago

According to the changelog for dplyr 1.1.0, na_if() now uses the vctrs package, which is stricter about type stability:

na_if() (#6329) now casts y to the type of x before comparison, which makes it clearer that this function is type and size stable on x.

So another potential fix could be this: na_if(x, "0")

vladimirsouza commented 1 year ago

The content of myc_region is

> myc_region
[1] "8:128747680-128753674"

Just like what mattssca got.

vladimirsouza commented 1 year ago

My dplyr version

> packageVersion("dplyr")
[1] ‘1.1.2’
mattssca commented 1 year ago

Right, let me update my dplyr and see if I can reproduce this error.

rdmorin commented 1 year ago

I'm not sure why you need to reproduce the bug if we know the bug exists and it's already fixed.

mattssca commented 1 year ago

I wanted to check to see if this indeed is a bug related to the dplyr version. Just as a sanity check. Regardless, I should probably still update my dplyr version.

Kdreval commented 1 year ago

I think what Adam is worried is that the bug fix for version >1.1 is probably not backwards compatible and won't work unless everyone else will update their dplyr version to the later release. Just taking this as opportunity to remind about this: https://morinlabsfu.slack.com/archives/CNFLH7YJ1/p1681434375143809 Adam, before you update dplyr, can you please check if the version of dplyr you have will have any issues dealing with new syntax? If it is fine on older versions, I think we can merge the fix then.

vladimirsouza commented 1 year ago

Thanks for bringing this up, Kdreval. But I think it should be fine. Because in my change, I used only basic R programming (region[region == ""] = NA), instead of dplyr::na_if function.

So, It should work, regardless dplyr version.

mattssca commented 1 year ago

Yes, I can confirm. The prosed fix seems to be backward compatible with dplyr version 1.0.8 as suspected, so we are good to go. Just wanted to double-check for the very reason Kostia just explained.

vladimirsouza commented 1 year ago

Thanks for confirming that, mattssca.

Kdreval commented 1 year ago

Thanks Adam and Vlad!

mattssca commented 1 year ago

@vladimirsouza Tip, close the issue once the proposed fix has made its way to master and not once you publish a new PR with the fix. This is usually how I do it, but others might have another take on this.