myles-lewis / locuszoomr

A pure R implementation of locuszoom for plotting genetic data at genomic loci accompanied by gene annotations.
GNU General Public License v3.0
18 stars 5 forks source link

link_recomb has curl errors #17

Closed Sabor117 closed 6 months ago

Sabor117 commented 7 months ago

I did notice the other recent issue with the link_recomb function, but that seems to be separate to the issue I have encountered.

Essentially after running the following:

analysis_locus = locus(index_snp = currrsid, data = plot_frame, flank = 5e5,
                            chrom = "chr", pos = "pos", p = "p", ens_db = ensDb_v111)

analysis_locus_ld = link_LD(analysis_locus, token = "<MYTOKEN>")

analysis_locus_ld = link_recomb(analysis_locus_ld, genome = "hg38")

I got this in return:

Retrieving recombination data from UCSC
Warning messages:
1: In curlSetOpt(..., .opts = .opts, curl = h, .encoding = .encoding) :
  Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x3b408e000): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!
2: In curlSetOpt(..., .opts = .opts, curl = h, .encoding = .encoding) :
  Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x3bf625180): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!

Making the LocusZoom plot with locus_plot still worked as expected and produced the same plots as before, with a "Recombination %" axis. However, there were no recombination rate blue lines within the plot.

Incidentally, this package is excellent! Really nice to use, thanks so much!

myles-lewis commented 7 months ago

Hi Sabor117,

That's odd - I've not encountered that before. Normally if the UCSC API gives an error no data is returned in which case link_recomb() will return the locus object containing NULL in the $recomb element of the locus object.

What chromosome are you querying? I found I got no data for chromosome X with Hg19. But then it did not add the Recombination axis at all.

The recombination data should be accessible in analysis_locus_ld$recomb. Can you do the following and show me what it generates please:

str(analysis_locus_ld$recomb)
head(analysis_locus_ld$recomb)

This should help to see what actual data, if any, was retrieved from UCSC.

Also, what OS are you on? Can you provide sessionInfo() so I can check if it's because of old versions of packages e.g. 'rtracklayer'.

Bw, Myles

Sabor117 commented 7 months ago

Okay, so to punish me for asking, when I just re-ran the code, it now didn't throw the error.

> str(analysis_locus_ld$recomb)
'data.frame':   587 obs. of  3 variables:
 $ start: num  21089478 21089483 21089669 21091844 21093259 ...
 $ end  : num  21089483 21089669 21091844 21093259 21095223 ...
 $ value: num  0.388 0.379 0.379 0.118 0.119 ...
> head(analysis_locus_ld$recomb)
     start      end    value
1 21089478 21089483 0.387659
2 21089483 21089669 0.379096
3 21089669 21091844 0.379031
4 21091844 21093259 0.118114
5 21093259 21095223 0.119046
6 21095223 21097826 0.119350
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.9 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /opt/intel/oneapi/mkl/2024.0/lib/libmkl_gf_lp64.so.2;  LAPACK version 3.10.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Helsinki
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] AnnotationHub_3.10.0      BiocFileCache_2.10.1     
 [3] dbplyr_2.4.0              dplyr_1.1.4              
 [5] stringr_1.5.1             EnsDb.Hsapiens.v75_2.99.0
 [7] ensembldb_2.26.0          AnnotationFilter_1.26.0  
 [9] GenomicFeatures_1.54.1    AnnotationDbi_1.64.1     
[11] Biobase_2.62.0            GenomicRanges_1.54.1     
[13] GenomeInfoDb_1.38.5       IRanges_2.36.0           
[15] S4Vectors_0.40.2          BiocGenerics_0.48.1      
[17] locuszoomr_0.2.0          data.table_1.14.10       

loaded via a namespace (and not attached):
 [1] DBI_1.2.1                     bitops_1.0-7                 
 [3] biomaRt_2.58.0                rlang_1.1.3                  
 [5] magrittr_2.0.3                matrixStats_1.2.0            
 [7] compiler_4.3.2                RSQLite_2.3.4                
 [9] png_0.1-8                     vctrs_0.6.5                  
[11] ProtGenerics_1.34.0           pkgconfig_2.0.3              
[13] crayon_1.5.2                  fastmap_1.1.1                
[15] ellipsis_0.3.2                XVector_0.42.0               
[17] utf8_1.2.4                    promises_1.2.1               
[19] Rsamtools_2.18.0              purrr_1.0.2                  
[21] bit_4.0.5                     zlibbioc_1.48.0              
[23] cachem_1.0.8                  jsonlite_1.8.8               
[25] progress_1.2.3                blob_1.2.4                   
[27] later_1.3.2                   DelayedArray_0.28.0          
[29] interactiveDisplayBase_1.40.0 BiocParallel_1.36.0          
[31] parallel_4.3.2                prettyunits_1.2.0            
[33] R6_2.5.1                      stringi_1.8.3                
[35] rtracklayer_1.62.0            Rcpp_1.0.12                  
[37] SummarizedExperiment_1.32.0   zoo_1.8-12                   
[39] R.utils_2.12.3                gggrid_0.2-0                 
[41] httpuv_1.6.14                 Matrix_1.6-5                 
[43] tidyselect_1.2.0              abind_1.4-5                  
[45] yaml_2.3.8                    codetools_0.2-19             
[47] curl_5.2.0                    lattice_0.21-9               
[49] tibble_3.2.1                  LDlinkR_1.3.0                
[51] withr_3.0.0                   shiny_1.8.0                  
[53] KEGGREST_1.42.0               xml2_1.3.6                   
[55] Biostrings_2.70.1             pillar_1.9.0                 
[57] BiocManager_1.30.22           filelock_1.0.3               
[59] MatrixGenerics_1.14.0         plotly_4.10.4                
[61] generics_0.1.3                RCurl_1.98-1.14              
[63] BiocVersion_3.18.1            hms_1.1.3                    
[65] ggplot2_3.4.4                 munsell_0.5.0                
[67] scales_1.3.0                  xtable_1.8-4                 
[69] glue_1.7.0                    lazyeval_0.2.2               
[71] tools_4.3.2                   BiocIO_1.12.0                
[73] GenomicAlignments_1.38.2      XML_3.99-0.16                
[75] cowplot_1.1.2                 grid_4.3.2                   
[77] tidyr_1.3.0                   colorspace_2.1-0             
[79] GenomeInfoDbData_1.2.11       restfulr_0.0.15              
[81] cli_3.6.2                     rappdirs_0.3.3               
[83] fansi_1.0.6                   S4Arrays_1.2.0               
[85] viridisLite_0.4.2             gtable_0.3.4                 
[87] R.methodsS3_1.8.2             digest_0.6.34                
[89] SparseArray_1.2.3             rjson_0.2.21                 
[91] htmlwidgets_1.6.4             R.oo_1.25.0                  
[93] memoise_2.0.1                 htmltools_0.5.7              
[95] lifecycle_1.0.4               httr_1.4.7                   
[97] mime_0.12                     bit64_4.0.5   

However, it also actually didn't make the plots with recombination rates...

Go figure?

myles-lewis commented 7 months ago

Ok. Try updating the locuszoomr package to version 0.2.1 which I released to CRAN a few days ago. This has better error control.

I find the UCSC API is weird. It sometimes occasionally just throws an error at random for no obvious reason. I suspect the UCSC server may get overloaded with requests and sometimes times out, but R might not be able to detect this or give an intelligent error. So the call to UCSC API from within link_recomb() sometimes just fails with an error - I've seen the curl error myself but only rarely during testing. But I restarted my R session and it disappeared.

Version 0.2.1 of locuszoomr is designed that if you call link_recomb() again it just tries to query UCSC again and get the data even if the first time it failed. In earlier versions this wouldn't work because of caching. In 0.2.1 the cache is reset if an error is detected. In my experience this fills in any missing loci where the API request was refused first time. But some areas of the genome are missing data, e.g. chromosome X in Hg19. They'll always give the same error every time.

However, if you have data in the $recomb slot it ought to plot ok. If it doesn't plot with locus_plot(), does locus_ggplot() do the same thing?

If it doesn't plot, I don't mind taking a look at it - you're welcome to email me the locus object if you wish.

BTW, base graphics locus_plot() the recombination rate line is under the points and sometimes the recombination line is hidden by them, whereas in locus_ggplot() the recombination line is on top (I can't easily change that!). So the line is easy to see in locus_ggplot(), especially if you change its colour using recomb_col.

Bw, Myles

Sabor117 commented 6 months ago

Hi Myles,

A quick update after the other day I have been working on this again and I should mention that the I again got the same warning/error messages when running link_recomb()

However, despite the warning/error it still produced the same contents of analysis_locus_ld$recomb as before.

It seems worth mentioning as well that I did as you suggested and installed locuszoomr from CRAN again, but when I checked the sessionInfo it says it is still version 0.20.0. Have the CRAN files been definitely updated?

IN_ALL_10_21589478

I have attached the picture I was able to make, and it does look like there is a recombination rate visible in the background (as you said, concealed by the SNPs) but I figured I'd update you on my progress here.

Sabor117 commented 6 months ago

Hi again Myles,

I've also had another error with the package now and am not sure whether to post this as a separate issue or not.

Essentially I've had this error a couple of times now with different SNPs/loci:

Obtaining LD on 1000 SNPs
LDlink server is working...

Error in `[.data.frame`(ldm, , index_snp) : undefined columns selected

This is from running link_LD, and like the other error I'm not sure exactly what is going wrong. The SNP is definitely present in the locus() object passed to link_LD.

myles-lewis commented 6 months ago

Hi Sabor117,

The 2nd issue with link_LD has definitely been partially addressed in the latest version of locuszoomr which is 0.2.1. The latest version is on CRAN for all OS. However, if you are installing from a CRAN mirror, it's possible that the mirror server hasn't updated yet. I suggest that you install the latest version 0.2.1 direct from GitHub using (you'll need the 'devtools' package installed):

devtools::install_github("myles-lewis/locuszoomr")

If you use the latest version you won't get the error message your wrote above. Instead it might work (hooray!). In the past the error would have been triggered if your index SNP was not in the top 1000 significant SNPs at a locus. That issue was fixed in v0.2.1.

But it might still give a message "Index SNP not found in LDlink data" and it will return your original locus object untouched. The issue then is that the index SNP is not found in the LD link API return object. If that's the case the LD data cannot be assigned. That's rare. If you still find an example after v0.2.1 can you email the locus object to me [I'm at myles (dot) lewis (at) qmul (dot) ac (dot) uk] and I can take a look and see if there is a feasible workaround.

Bw, Myles