marbl / CHM13

The complete sequence of a human genome
Other
920 stars 99 forks source link

CHM13v2 to HG38 liftover, chain import problems #59

Closed jamesdalg closed 2 years ago

jamesdalg commented 2 years ago

I'm having import issues with the new CHM13 chain using the liftOver package in R. The usual chains work for that package, but I can't get CHM13 to work. If there's a workaround, I'm very willing to try it.

> ch = import.chain(system.file(package="liftOver", "extdata", "hg38ToHg19.over.chain"))
> ch
Chain of length 25
names(25): chr22 chr21 chr19 chr20 chrY chr18 chrX chr17 chr16 chr15 chr14 chr13 chr12 chr11 chr10 chrM chr9 chr8 chr7 chr6 chr5 chr4 chr3 chr2 chr1
> ch = import.chain(system.file(package="liftOver", "extdata", "hg19ToHg38.over.chain"))
> ch
Chain of length 25
names(25): chr22 chr21 chr19 chr20 chr18 chrY chr17 chrX chr16 chr15 chr14 chr13 chr12 chr11 chr10 chrM chr9 chr8 chr7 chr6 chr5 chr4 chr3 chr2 chr1
> ch = import.chain(system.file(package="liftOver", "extdata", "grch38-chm13v2.chain"))
Error in .local(con, format, text, ...) : 
  expected 11 elements in header, got 1, on line 1

chainfiles.zip

jamesdalg commented 2 years ago

Here is my sessionInfo, so you can see the platform and package versions I'm using:

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] liftOver_1.20.0                         Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.15.0                     GO.db_3.15.0                           
 [6] OrganismDbi_1.38.0                      GenomicFeatures_1.48.0                  AnnotationDbi_1.58.0                    Biobase_2.56.0                          rtracklayer_1.56.0                     
[11] GenomicRanges_1.48.0                    GenomeInfoDb_1.32.1                     IRanges_2.30.0                          S4Vectors_0.34.0                        BiocGenerics_0.42.0                    
[16] gwascat_2.28.0                          magrittr_2.0.3                         

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                matrixStats_0.62.0          bit64_4.0.5                 filelock_1.0.2              progress_1.2.2              httr_1.4.3                  tools_4.2.0                
 [8] utf8_1.2.2                  R6_2.5.1                    DBI_1.1.2                   tidyselect_1.1.2            prettyunits_1.1.1           bit_4.0.4                   curl_4.3.2                 
[15] compiler_4.2.0              graph_1.74.0                cli_3.3.0                   xml2_1.3.3                  DelayedArray_0.22.0         readr_2.1.2                 RBGL_1.72.0                
[22] rappdirs_0.3.3              stringr_1.4.0               digest_0.6.29               Rsamtools_2.12.0            XVector_0.36.0              pkgconfig_2.0.3             MatrixGenerics_1.8.0       
[29] highr_0.9                   dbplyr_2.1.1                fastmap_1.1.0               BSgenome_1.64.0             rlang_1.0.2                 RSQLite_2.2.14              BiocIO_1.6.0               
[36] generics_0.1.2              BiocParallel_1.30.0         dplyr_1.0.9                 VariantAnnotation_1.42.0    RCurl_1.98-1.6              GenomeInfoDbData_1.2.8      Matrix_1.4-1               
[43] Rcpp_1.0.8.3                fansi_1.0.3                 lifecycle_1.0.1             stringi_1.7.6               yaml_2.3.5                  SummarizedExperiment_1.26.1 zlibbioc_1.42.0            
[50] BiocFileCache_2.4.0         grid_4.2.0                  blob_1.2.3                  parallel_4.2.0              snpStats_1.46.0             crayon_1.5.1                lattice_0.20-45            
[57] Biostrings_2.64.0           splines_4.2.0               hms_1.1.1                   KEGGREST_1.36.0             knitr_1.39                  pillar_1.7.0                rjson_0.2.21               
[64] biomaRt_2.52.0              XML_3.99-0.9                glue_1.6.2                  evaluate_0.15               data.table_1.14.2           BiocManager_1.30.17         png_0.1-7                  
[71] vctrs_0.4.1                 tzdb_0.3.0                  purrr_0.3.4                 tidyr_1.2.0                 assertthat_0.2.1            cachem_1.0.6                xfun_0.30                  
[78] restfulr_0.0.13             survival_3.3-1              tibble_3.1.7                GenomicAlignments_1.32.0    memoise_2.0.1               ellipsis_0.3.2        
diekhans commented 2 years ago

Those chain files lack chain ids, which could cause problems. Try grabbing these:

https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/liftOver/

  chm13v2-hg19.chain.gz                          2022-04-09 10:35  2.1M  
  chm13v2-hg19.over.chain.gz                     2022-04-24 20:58  2.1M  
  chm13v2-hg19_chrM.over.chain.gz                2022-04-24 20:41  2.1M  
  chm13v2-hg19_chrMT.over.chain.gz               2022-04-24 20:41  2.1M  
  chm13v2-hg38.over.chain.gz                     2022-04-24 20:41  2.1M  
  hg19-chm13v2.over.chain.gz                     2022-04-24 20:59  2.1M  
  hg19_chrM-chm13v2.over.chain.gz                2022-04-24 20:41  2.1M  
  hg19_chrMT-chm13v2.over.chain.gz               2022-04-24 20:41  2.1M  
  hg38-chm13v2.over.chain.gz                     2022-04-24 20:41  2.1M  
jamesdalg commented 2 years ago

The chain loads! Thanks!