rcavalcante / annotatr

Package Homepage: http://bioconductor.org/packages/devel/bioc/html/annotatr.html Bug Reports: https://support.bioconductor.org/p/new/post/?tag_val=annotatr.
26 stars 8 forks source link

lncRNA link not working #48

Closed AlejRSosa closed 5 months ago

AlejRSosa commented 1 year ago

Hello, I don't know if this project is still being updated or not, but I was trying to annotate my DMRs using annotatr and even though most of the annotations that I wanted are working, the one for hg19_lncrna_gencode is throwing an error as it is not finding the URL.

It is currently trying to find the information from 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz' and since this leads to an empty page, it is not working anymore.

I have managed to download the .gz file from https://www.gencodegenes.org/human/release_19.html and I thought I could perhaps make a custom annotation but I am not managing. What I have tried is this:

> data.file <- '/home/alejandrarodrigu21/Downloads/gencode.v19.long_noncoding_RNAs.gtf.gz'
> read_annotations(data.file, name='lncRNA', genome='hg19')
Error in .local(con, format, text, ...) : 
  unused argument (extraCols = character(0))

But I do not understand what the error is - I am not that fluent in R yet. I have also tried this:

annotationgr = build_annotations(genome='hg19', annotations='/home/alejandrarodrigu21/Downloads/gencode.v19.long_noncoding_RNAs.gtf.gz')
Error: ‘/home/alejandrarodrigu21/Downloads/gencode.v19.long_noncoding_RNAs.gtf.gz’ not in annotatr_cache

Could anyone help me with this? Thanks, Alejandra

rcavalcante commented 1 year ago

Hi,

Apologies for the very late reply. I think the lncRNA resource error might have been transient (not dissimilar to Issue #49). I was able to

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz

just now, and I was able to build all the hg19 resources that are built-in with

annot <- builtin_annotations()[grep("hg19", builtin_annotations())]
annotations <- build_annotations(genome = 'hg19', annotations = annot)

Have you, since you created the issue, been able to retrieve the lncRNA annotation that's built-in?

As to your other question about getting the file yourself, and the error you get about the unused extraCols argument, I am somewhat perplexed at the moment. On a lark, I grabbed a BED file of mouse enhancers and did

read_annotations('~/Downloads/mouse_permissive_enhancers_phase_1_and_2.bed.gz', name = 'test', genome = 'mm10')

which worked, but then when I tried the GTF file I got the same error as you.

So I think the issue is that the extraCols parameter isn't used in the subsequent function calls used for the GTF, but is for the BED. I'll have to figure out how to deal with this, so thanks for surfacing this issue.

With that said, please do check if the lncRNA resource issue was transient, and let me know.

Thanks, Raymond

rcavalcante commented 5 months ago

In the end, all annotatr truly needs for a set of annotations is a GenomicRanges object with some particular columns.

tx_id, gene_id, and symbol are all optional (i.e. they can be NA).

rcavalcante commented 5 months ago

FWIW, all the lncRNA built-ins seem to be working fine.