robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
69 stars 15 forks source link

Where was the centromere file downloaded? #275

Closed yuanyuanhe2021 closed 2 years ago

yuanyuanhe2021 commented 3 years ago

Hi, I am trying to run the Genova pipline, but have diffeculties when dowloading the centromere file in vignette hg19_cytobandAcen.bed. I could only find centromere file for hg38 in UCSC, but it was different from hg19_cytobandAcen.bed shown in the Genova vignette. Could you please help me about how could I download the hg19_cytobandAcen.bed file used in the vignette? Thank you very much.

teunbrand commented 2 years ago

Hello there,

That file is just a local file we have on our system that is essentially the UCSC cytoband.txt.gz file filtered for centromeres and summarised such that there is 1 range per chromosome. Below you can find how to replicate what we have as a local file.

library(data.table)

# This local file probably won't work on your system
centros <- "/DATA/references/human/hg19/cytobandAcen.bed"
centros <- fread(centros)

ucsc_centros <- "https://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz"
ucsc_centros <- fread(ucsc_centros)

# Filter for centromeres
ucsc_centros <- ucsc_centros[V5 == "acen", V1:V3]

# Merge bins
ucsc_centros <- ucsc_centros[, .(V2 = min(V2), V3 = max(V3)), by = "V1"]

# Are the UCSC centromeres and our file now the same?
identical(centros, ucsc_centros)
#> [1] TRUE