Closed rcavalcante closed 3 years ago
Hi @rcavalcante
Can you lead with an example. Like if we wish to download the hg38 lncRNA coordinates how will we do that?
annotatr::build_lncrna_annots(genome = "hg38")
Error: 'build_lncrna_annots' is not an exported object from 'namespace:annotatr'
##also tried this way ###
annotatr::build_annotations(genome = 'hg38', annotations = 'hg38_lncrna_gencode')
snapshotDate(): 2020-04-27
Building lncRNA transcripts...
Error in .local(x, i, j = j, ...) : 'i' must be length 1
How can I download it? Please help
EDIT1: Okay so looking at the source code I realized that for hg19 you have only one lncRNA files but for hg38, you are querying annotation hub that returns two IDs
> ID
[1] "AH49559" "AH75123"
AH49559 sub-set of the main annotation files on the reference chromosomes. They contain only the lncRNA genes. Long non-coding RNA genes are considered the genes with any of those biotypes: 'processed_transcript', 'lincRNA', '3prime_overlapping_ncrna', 'antisense
AH75123 sub-set of the main annotation files on the reference chromosomes. They contain only the lncRNA genes. Long non-coding RNA genes are considered the genes with any of those biotypes: 'processed_transcript', 'lincRNA', '3prime_overlapping_ncrna', 'antisense', 'non_coding', 'sense_intronic' , 'sense_overlapping' , 'TEC' , 'known_ncrna'.
Maybe a good workaround could be to choose the latest release as default for lncRNA or use something like this that provided both ID and the information about the annotations to the users
print(GenomicRanges::mcols(query)[GenomicRanges::mcols(query)$genome == hub_genome, c('genome','description')])
ID = readline()
The user then can enter their desired ID with which they wish to continue having read that description
EDIT2 For some reason the latest release could not be downloaded using
> ah[[ID[2]]]
loading from cache
Error: failed to load resource
name: AH75123
title: gencode.v31.long_noncoding_RNAs.gff3.gz
reason: argument is of length zero
Can you help me what to use in annotation_class in build_ah_annots function for lncrna?
Okay so after so many hit and trials I was able to do it After looking at the source code and issue posted here and of course struggling for hours, the following code worked.
You should include this line in manual: ##Keep the format intact i.e. in case if you are building custom annotation from BED file or AH, keep annotation_class = 'custom'.
code = c('lncrna' = 'AH49559')
build_ah_annots(genome = 'hg38', ah_codes = code, annotation_class = 'custom')
print(annotatr_cache$list_env())
##print(annotatr_cache$list_env())
##[1] "hg38_custom_lncrna"
anno_lncrna = annotatr::build_annotations(genome = 'hg38', annotations = 'hg38_custom_lncrna')
Again, I apologize for the late reply owing to the holidays.
I agree that the process of annotation creation from resources (and in general) could be more straightforward, and I appreciate the suggestions you've made.
In
build_lncrna_annots()
we're doing something a bit roundabout by querying and then looking for the genome. This worked when there was only onegenome == GRCh38
but now there are two, and this function isn't going to work so well.