rcavalcante / annotatr

Package Homepage: http://bioconductor.org/packages/devel/bioc/html/annotatr.html Bug Reports: https://support.bioconductor.org/p/new/post/?tag_val=annotatr.
26 stars 8 forks source link

lncrna ftp link not working? #17

Closed beausjo closed 2 years ago

beausjo commented 5 years ago

For the past few days, I've been having trouble getting the built in annotation of hg19_lncrna_gencode to work. I updated annotatr and here's the error:

> annotatr::build_annotations(genome = 'hg19', annotations = 'hg19_lncrna_gencode')
Building lncRNA transcripts...
trying URL 'ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz'
Error in download.file(resource(con), destfile) : 
  cannot open URL 'ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz'

I think it is the Sanger ftp web site since I can download it manually, albeit from a slightly different address: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz instead of ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz

It seems like a simple fix inside of build_lncrna_annots but maybe something else is going on?

rcavalcante commented 5 years ago

Thanks for the heads up, it looks like they changed their link structure. I'll patch that and push it to GitHub, but because Bioconductor is in the middle of a release, they've frozen their devel and release branches, so I'll get those updated in Bioc 3.8 and in the devel version for Bioc 3.9. Unfortunately, I don't think I'll be able to patch the package for Bioc 3.7.

clersdom commented 4 years ago

Hi,

I am also trying to get the hg19_lncrna_gencode annotations, and having the same error. I have also tried to built a custom annotation but have errors as well (reported recently as another issue).

Have you @beausjo been able to load the lncRNAs from the Sanger ftp address from above?

Many thanks, Clara

rcavalcante commented 4 years ago

Hi All,

I just pushed a hotfix to the Bioc 3.10 version of annotatr that updates the hg19 lncRNA URL, and that should work its way through the Bioc build system in the next couple of days.

I also pushed hotfixes for Bioc 3.8 and Bioc 3.9 releases. However, you will have to install these with devtools::install_github because those release branches are frozen at Bioconductor.

For Bioc 3.8

devtools::install_github('rcavalcante/annotatr@RELEASE_3_8-lncRNA')

For Bioc 3.9

devtools::install_github('rcavalcante/annotatr@RELEASE_3_9-lncRNA')

For Bioc 3.10

devtools::install_github('rcavalcante/annotatr@RELEASE_3_10')

I tested each in a Docker container with the right R / Bioconductor versions and they seem to work. Let me know if you have any problems, and if I don't hear back in a week, I'll close the issue.

Thanks, Raymond

clersdom commented 4 years ago

Hi Raymond, thanks for the updates.

As I have Bioc version 3.9, I have run:

devtools::install_github('rcavalcante/annotatr@RELEASE_3_8-lncRNA')

Then, `annots = c('hg19_cpgs', 'hg19_basicgenes','hg19_enhancers_fantom', 'hg19_lncrna_gencode', 'hg19_genes_intronexonboundaries')

annotations = build_annotations(genome = 'hg19', annotations = annots) Building enhancers... 'select()' returned 1:1 mapping between keys and columns Building promoters... Building 1to5kb upstream of TSS... Building 5UTRs... Building 3UTRs... Building exons... Building introns... Building intron exon boundaries... snapshotDate(): 2019-05-02 Building CpG islands... downloading 0 resources loading from cache ‘AH5086 : 5086’ Building CpG shores... Building CpG shelves... Building inter-CpG-islands... Building lncRNA transcripts... trying URL 'ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz' Error in download.file(resource(con), destfile) : cannot open URL 'ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz'`

So still getting this error message. Is there anything else that needs to be done? Or probably just wait a couple of days more?

Thanks again, Clara

rcavalcante commented 4 years ago

Hi Clara,

In your message you said Bioc 3.9, but in the code you used annotatr@RELEASE_3_8-lncRNA. Is one of those a mistake?

But nevertheless, the fact that it's still looking at ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz indicates that the package didn't update.

Are you certain that the package installation (make sure the branch is correct) didn't fail? That's the only situation, I can imagine, in which you'd try the lncRNAs again and the package is still using the incorrect URL.

Thanks, Raymond

PS If you take a look here, you will note the change in URL.

clersdom commented 4 years ago

Yes sorry, I used devtools::install_github('rcavalcante/annotatr@RELEASE_3_9-lncRNA').

If I try installing again devtools::install_github('rcavalcante/annotatr@RELEASE_3_9-lncRNA') Skipping install of 'annotatr' from a github remote, the SHA1 (3b778b88) has not changed since last install. Useforce = TRUEto force installation

Is there anyway to ensure this has been correctly installed? When I run sessionInfo() does not come up. Thanks! Clara

rcavalcante commented 4 years ago

Hi Clara,

Try the following,

library(annotatr)
sessionInfo()

and check that you see that annotatr is version 1.10.1. Though the devtools::install_github message seems to indicate that's the version you have.

You could perhaps try remove.packages('annotatr') and then installing it again if you're still getting the error. As I linked to here the URL has definitely been changed, and when I test it I am able to download the resource.

Thanks, Raymond

clersdom commented 4 years ago

Hi Raymond, I have tried uninstalling and installing again and has worked now. Thanks a lot for the tips!

The funny thing is that now I am having the error that I have seen is in here as well... Clarify warning message for `subset_order_tbl() #16

I have 13 known annotations, and it annotates them well, but the issue comes when I try plotting when I run

MY_annotations = plot_annotation( annotated_regions = dm_annotated, annotation_order = annots_order, plot_title = 'Number of DMRs per annotation', x_label = 'Known Gene Annotations', y_label = 'Count')

Warning message: In subset_order_tbl(tbl = annotated_regions, col = "annot.type", : There are elements in col_order that are not present in the corresponding column. Check for typos, or this could be a result of 0 tallies.

clersdom commented 4 years ago

Hi again,

Seems that there is an issue when hg19_genes_intronexonboundaries and hg19_lncrna_gencode annotations are plotted together. When I run the command plot_annotation with one or the other but not both under the same command, it does not produce an error.

rcavalcante commented 4 years ago

Hi,

I don't think the problem is caused by that. Here is a toy example and the result (which you should also be able to run):

library(annotatr)
library(ggplot2)

dm_file = system.file('extdata', 'IDH2mut_v_NBM_multi_data_chr9.txt.gz', package = 'annotatr')
extraCols = c(diff_meth = 'numeric', mu1 = 'numeric', mu0 = 'numeric')
dm_regions = read_regions(con = dm_file, genome = 'hg19', extraCols = extraCols, rename_score = 'pval', rename_name = 'DM_status', format = 'bed')

annotations = c('hg19_genes_intronexonboundaries', 'hg19_lncrna_gencode')
annots = build_annotations(genome = 'hg19', annotations = annotations)

dm_annotated = annotate_regions(
    regions = dm_regions,
    annotations = annots,
    ignore.strand = TRUE,
    quiet = TRUE)

plot = plot_annotation(annotated_regions = dm_annotated, annotation_order = annotations, plot_title = 'Test', x_label = 'Annotation', y_label = 'Count')
ggsave(filename = 'test.png', plot = plot, width = 6, height = 6)

test

Perhaps the following things will help us track down this problem:

  1. A fuller context for your code snippet, including the code defining all the variables you're using.
  2. The image that is output by the code you ran where subset_order_tbl seems to be missing things.
  3. Does the example code above work for you?
  4. What does the summarization function look like on your data (see below)?
# Change dm_annotated to whatever your variable is
summarize_annotations(annotated_regions = dm_annotated)

# Counting annotation types
# # A tibble: 2 x 2
#   annot.type                          n
#   <chr>                           <int>
# 1 hg19_genes_intronexonboundaries  3595
# 2 hg19_lncrna_gencode              1309

Thanks, Raymond