rcavalcante / annotatr

Package Homepage: http://bioconductor.org/packages/devel/bioc/html/annotatr.html Bug Reports: https://support.bioconductor.org/p/new/post/?tag_val=annotatr.
26 stars 8 forks source link

Error in build_annotations in mm10 : : Error in download.file (url, destfile, quiet = TRUE) #58

Closed GitR-Bio closed 5 months ago

GitR-Bio commented 9 months ago

Hi, I'm having the following error in build_annotations. Any idea will be highly appreciated.

loading from cache
Error in download.file(url, destfile, quiet = TRUE) : 
  cannot open URL 'https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz'

detailed commands are below:

> library(annotatr)

> annots=c("mm10_genes_1to5kb", "mm10_genes_promoters", "mm10_genes_cds", "mm10_genes_5UTRs", "mm10_genes_exons", "mm10_genes_firstexons","mm10_genes_introns","mm10_genes_intronexonboundaries","mm10_genes_exonintronboundaries","mm10_genes_3UTRs", "mm10_genes_intergenic","mm10_enhancers_fantom","mm10_lncrna_gencode" )

> BiocManager::install("TxDb.Mmusculus.UCSC.mm10.knownGene")
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.2 (2023-10-31 ucrt)
Installation paths not writeable, unable to update packages
  path: C:/Program Files/R/R-4.3.2/library
  packages:
    cluster, foreign, lattice, MASS, Matrix, mgcv, nlme, rpart
Warning message:
package(s) not installed when version(s) same as or greater than current; use `force = TRUE` to re-install:
  'TxDb.Mmusculus.UCSC.mm10.knownGene' 

> annotations = build_annotations(genome = 'mm10', annotations = annots)
Building enhancers...
snapshotDate(): 2023-10-23
loading from cache
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
'select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building intergenic...
Building cds...
Building 5UTRs...
Building 3UTRs...
Building exons...
Building first exons...
Building introns...
Building intron exon boundaries...
Building exon intron boundaries...
snapshotDate(): 2023-10-23
Building lncRNA transcripts...
loading from cache
Error in download.file(url, destfile, quiet = TRUE) : 
  cannot open URL 'https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz'
Called from: download.file(url, destfile, quiet = TRUE)
Browse[1]> 

> annotatr_cache$list_env()
character(0)

Thanks in advance.

mbassalbioinformatics commented 9 months ago

Similar problem when attempting to build annotations

 build_annotations(genome = 'mm10', annotations = "mm10_cpgs")

I then get back the error

Building CpG islands...
Error in open.connection(5L, "rb") : HTTP error 403.

Ive tried hg38 and still the same problem. Ive tried multiple machines in different geographic locations and still the same problem.

Any ideas/suggestions please?

rcavalcante commented 9 months ago

It seems we've reached the end of the road for the URLs being stable and accessible, and this is causing failure on the Bioconductor build machines as well.

I'll be looking for Bioconductor resources that contain these resources to avoid downloading from brittle links.

Unfortunately work responsibilities will prevent me from working on this until next week at the earliest.

mbassalbioinformatics commented 9 months ago

Ok so it seems in the build_annotation.R file, the following URL

http://hgdownload.cse.ucsc.edu/

needs to be changed to

http://hgdownload2.cse.ucsc.edu/

So, change the lines in question (5 in total if i remember correctly), save, re tar.gz the folder and reinstall the package from the archive. That seems to have worked for me.

GitR-Bio commented 9 months ago

Thank you so much for your valuable responses. Probably the connection has been reestablished in my case. But there appeared to be an warning at the end of the codes.

annotations = build_annotations(genome = 'mm10', annotations = annots)
Building enhancers...
snapshotDate(): 2023-10-23
loading from cache
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
'select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building intergenic...
Building cds...
Building 5UTRs...
Building 3UTRs...
Building exons...
Building first exons...
Building introns...
Building intron exon boundaries...
Building exon intron boundaries...
snapshotDate(): 2023-10-23
Building lncRNA transcripts...
loading from cache
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2 out-of-bound ranges located on sequence chr4_JH584295_random. Note that ranges located
  on a sequence whose length is unknown (NA) or on a circular sequence are not considered out-of-bound (use
  seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use
  trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information.
GitR-Bio commented 9 months ago

Similar problem when attempting to build annotations

 build_annotations(genome = 'mm10', annotations = "mm10_cpgs")

I then get back the error

Building CpG islands...
Error in open.connection(5L, "rb") : HTTP error 403.

Ive tried hg38 and still the same problem. Ive tried multiple machines in different geographic locations and still the same problem.

Any ideas/suggestions please?

I have tried with "hg38" and now probably it is working. Could you please rerun for whether resolved automatically.

annots=c("hg38_cpg_islands","hg38_genes_3UTRs","hg38_genes_intergenic","hg38_genes_exonintronboundaries","hg38_lncrna_gencode") 
annotations = build_annotations(genome = 'hg38', annotations = annots)

select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building intergenic...
Building 3UTRs...
Building exons...
Building introns...
Building exon intron boundaries...
Building CpG islands...
snapshotDate(): 2023-10-23                                                                                                 
Building lncRNA transcripts...
loading from cache
mbassalbioinformatics commented 9 months ago

Have you changed the url in the source file? If not, then do as i commented before and try again. Make sure you restart your R session once you reinstall the new updated package.

rcavalcante commented 5 months ago

I didn't have this problem on a fresh install. I think this is actually a transient issue. I was able to build hg19, hg38, and mm10 lncRNA resources.

rcavalcante commented 5 months ago

Moreover, I have been able to build the CpG island annotations that were also mentioned in this thread.