Closed clersdom closed 4 years ago
Hi Clara,
I think the only issue is that in:
annots_custom = c('hg19_cpgs', 'hg19_basicgenes','hg19_enhancers_fantom', 'custom_lncRNAs')
You should replace custom_lncRNAs
with hg19_custom_lncRNAs
.
The interface for custom annotations is a little strange (I had some assistance from Bioconductor because they didn't agree with my initial implementation). In short, you don't actually assign the result of read_annotations()
to a variable because the custom annotations are held in an object called annotatr_cache
. Here is the example from the vignette:
## Use ENCODE ChIP-seq peaks for EZH2 in GM12878
## These files contain chr, start, and end columns
ezh2_file = system.file('extdata', 'Gm12878_Ezh2_peak_annotations.txt.gz', package = 'annotatr')
## Custom annotation objects are given names of the form genome_custom_name
read_annotations(con = ezh2_file, genome = 'hg19', name = 'ezh2', format = 'bed')
# To see the annotations you just created
print(annotatr_cache$get('hg19_custom_ezh2'))
# To see what's in annotatr_cache
print(annotatr_cache$list_env())
Let me know if that solves your problem.
Thanks, Raymond
Hi Raymond,
It has worked now, thanks! I have checked in
print(annotatr_cache$list_env()) [1] "hg19_custom_hg19_lncrna_gencode" "hg19_custom_lncrna"
So just replaced custom_lncRNAs
with hg19_custom_lncrna
as you have noted.
Thanks! Clara
Wonderful, I'll go ahead and close this issue. I have also made a fix for the lncRNA URL in hg19 which was mentioned in #17.
Hi @rcavalcante I was trying to build lncRNA annotations using the following command:
lncRNA = 'hg19_custom_lncRNAs'
anno_basic = build_annotations(genome = 'hg19', annotations = lncRNA)
However, it gives me the following error
Error: ‘hg19_custom_lncRNAs’ not in annotatr_cache
The hg19_lncrna_gencode is however working. Are the two same and is it fine to used gencode one in the place of custom set. If not how can I build annotations for custom lncRNA.
Also does 'hg19_cpgs', 'hg19_basicgenes', 'hg19_genes_intergenic', 'hg19_genes_intronexonboundaries' etc are for UCSC hg19?
Hello,
The only way there could be a hg19_custom_lncRNAs
annotation was if you created it yourself with the read_annotations()
function. You would need a BED-like file that defined the lncRNA to your satisfaction.
The hg19_lncrna_gencode
is the built-in resource that comes from: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz. You could customize the hg19_lncrna_gencode
annotations by filtering out whichever class of lncRNAs you didn't need.
> lncrnas_gr = build_annotations(genome = 'hg19', annotations = 'hg19_lncrna_gencode')
Building lncRNA transcripts...
trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz'
Content type 'unknown' length 2330848 bytes (2.2 MB)
==================================================
> table(stringr::str_split_fixed(lncrnas_gr$id, ':', 2)[,1])
3prime_overlapping_ncrna antisense lincRNA
25 9710 11780
miRNA misc_RNA processed_transcript
61 16 608
retained_intron rRNA sense_intronic
483 4 802
sense_overlapping snoRNA snRNA
330 72 7
As to your second question, the hg19 genic annotations are based on UCSC. The CpGs are also taken UCSC via AnnotationHub
ID AH5086
.
Hope that helps. And in future, please create a new issue rather than adding a comment onto a closed issue that isn't directly related to your question.
Hi @rcavalcante . Thanks for your prompt rejoinder and my apologies for opening a discussion on a closed one. The gencode lncrna does give broad counts of lncRNA usingthe following command
anno_lncrna = annotatr::build_annotations(genome = 'hg19', annotations = 'hg19_lncrna_gencode')
df_anno_lncRNA = annotate_regions( regions = dm_regions, annotations = anno_lncrna, ignore.strand = TRUE, quiet = FALSE) lncRNA = data.frame(df_anno_lncRNA)
but if we wish to get more resolved annotation like if they are snRNA or lincRNA in spite of broad annotation term such as hg19_lncrna_gencode
. Do I need to go and edit the downloaded gtf file?
No, if you look in the id
column, you will note that the particular type of lncRNA is contained therein.
> table(stringr::str_split_fixed(lncrnas_gr$id, ':', 2)[,1])
3prime_overlapping_ncrna antisense lincRNA
25 9710 11780
miRNA misc_RNA processed_transcript
61 16 608
retained_intron rRNA sense_intronic
483 4 802
sense_overlapping snoRNA snRNA
330 72 7
I made the decision to not enumerate these in the type
column because there were so many, and I thought figures would get too crowded. However, you should be able to change the type
column to your liking after building the lncRNA annotations.
lncrnas_gr = build_annotations(genome = 'hg19', annotations = 'hg19_lncrna_gencode')
subtypes = str_split_fixed(test$id, ':', 2)[,1]
lncrnas_gr$type = paste('hg19_lncrna', subtypes, sep = '_')
You can then see that the types are:
> unique(lncrnas_gr$type)
[1] "hg19_lncrna_lincRNA"
[2] "hg19_lncrna_miRNA"
[3] "hg19_lncrna_antisense"
[4] "hg19_lncrna_retained_intron"
[5] "hg19_lncrna_sense_intronic"
[6] "hg19_lncrna_snRNA"
[7] "hg19_lncrna_processed_transcript"
[8] "hg19_lncrna_snoRNA"
[9] "hg19_lncrna_sense_overlapping"
[10] "hg19_lncrna_3prime_overlapping_ncrna"
[11] "hg19_lncrna_misc_RNA"
[12] "hg19_lncrna_rRNA"
And when you go to create any figures, you will use these annotations in the annotation_order
parameter.
Hope that helps.
Hi,
I am trying to build a custom annotation for annotating lncRNAs, as below
lncRNAs_custom <- file.path("./","lncRNAs.bed") extraCols_lncRNA = c(symbol = 'character') custom_lncRNAs<- read_annotations(lncRNAs_custom, genome = "hg19", extraCols = extraCols_lncRNA, format = "bed", name = "lncRNAs")
head(custom_lncRNAs) GRanges object with 6 ranges and 5 metadata columns: seqnames ranges strand | id tx_id gene_id symbol type