hbctraining / DGE_workshop_salmon_online

https://hbctraining.github.io/DGE_workshop_salmon_online/
163 stars 75 forks source link

Creating annotation file tx2gene for NCBI human transcriptome #30

Closed ellalalalalalala closed 1 year ago

ellalalalalalala commented 1 year ago

Hi,

thanks a lot for the fantastic workshop for DGE analyses, I really enjoy it and learned a lot. :)

I am now trying to run analyses with my own data. For previous SNP analyses and now the Salmon quantification I used the NCBI RefSeq Transcripts FASTA (https://www.ncbi.nlm.nih.gov/genome/guide/human/). Thus, I am trying to build my tx2gene annotation file from the NCBI annotation. Would you have a recommendation, which ah$dataprovider to query? Is there anything else I should adapt/ keep my eyes on, compared to the presented workflow using ensembldb?

Thanks a lot in advance for your help. :)

Best wishes, Ella

mistrm82 commented 1 year ago

Hi @ellalalalalalala

I think for NCBI annotations you might better off using OrgDb. It will usually be the most current build, so using this will get you hg38 which it looks like you want? You will see that the data is current from Sept 2021.

query(ah, c("Homo sapiens", "OrgDb"))

AnnotationHub with 1 record
# snapshotDate(): 2021-10-20
# names(): AH95959
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Homo sapiens
# $rdataclass: OrgDb
# $rdatadateadded: 2021-10-08
# $title: org.Hs.eg.db.sqlite
# $description: NCBI gene ID based annotations about Homo sapiens
# $taxonomyid: 9606
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/pub/current_fasta
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH95959"]]' 

Hope this helps!