Closed yuanyuanhe2021 closed 2 years ago
Hello there,
That file is just a local file we have on our system that is essentially the UCSC cytoband.txt.gz file filtered for centromeres and summarised such that there is 1 range per chromosome. Below you can find how to replicate what we have as a local file.
library(data.table)
# This local file probably won't work on your system
centros <- "/DATA/references/human/hg19/cytobandAcen.bed"
centros <- fread(centros)
ucsc_centros <- "https://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz"
ucsc_centros <- fread(ucsc_centros)
# Filter for centromeres
ucsc_centros <- ucsc_centros[V5 == "acen", V1:V3]
# Merge bins
ucsc_centros <- ucsc_centros[, .(V2 = min(V2), V3 = max(V3)), by = "V1"]
# Are the UCSC centromeres and our file now the same?
identical(centros, ucsc_centros)
#> [1] TRUE
Hi, I am trying to run the Genova pipline, but have diffeculties when dowloading the centromere file in vignette hg19_cytobandAcen.bed. I could only find centromere file for hg38 in UCSC, but it was different from hg19_cytobandAcen.bed shown in the Genova vignette. Could you please help me about how could I download the hg19_cytobandAcen.bed file used in the vignette? Thank you very much.