Closed zhangjy859 closed 1 year ago
I wish to report that this is an issue as well. If anyone is familiar with a workaround, that would be incredibly helpful. It happens with the latest version of rtracklayer and on both chains.
> hg38_to_chm13_chain<-rtracklayer::import.chain("grch38-chm13v2.chain")
Error in .local(con, format, text, ...) :
expected 11 elements in header, got 1, on line 1
> hg38_to_chm13_chain<-rtracklayer::import.chain("chm13v2-grch38.chain")
Error in .local(con, format, text, ...) :
expected 11 elements in header, got 3, on line 4
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_2.0.3 StructuralVariantAnnotation_1.12.0 rtracklayer_1.56.1 VariantAnnotation_1.42.1
[5] Rsamtools_2.12.0 Biostrings_2.64.1 XVector_0.36.0 regioneR_1.28.0
[9] GenomicInteractions_1.32.0 InteractionSet_1.26.1 SummarizedExperiment_1.26.1 Biobase_2.56.0
[13] MatrixGenerics_1.8.1 matrixStats_0.63.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[17] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 rjson_0.2.21 deldir_1.0-9 biovizBase_1.46.0 htmlTable_2.4.1 base64enc_0.1-3
[7] dichromat_2.0-0.1 rstudioapi_0.14 bit64_4.0.5 AnnotationDbi_1.58.0 fansi_1.0.4 xml2_1.3.4
[13] codetools_0.2-19 cachem_1.0.8 knitr_1.42 Formula_1.2-5 cluster_2.1.4 dbplyr_2.3.2
[19] png_0.1-8 compiler_4.2.3 httr_1.4.6 backports_1.4.1 Matrix_1.5-4.1 fastmap_1.1.1
[25] lazyeval_0.2.2 cli_3.6.1 htmltools_0.5.5 prettyunits_1.1.1 tools_4.2.3 igraph_1.4.2
[31] gtable_0.3.3 glue_1.6.2 GenomeInfoDbData_1.2.8 dplyr_1.1.2 rappdirs_0.3.3 Rcpp_1.0.10
[37] vctrs_0.6.2 xfun_0.39 stringr_1.5.0 lifecycle_1.0.3 restfulr_0.0.15 GS3DKViz_1.0.2
[43] ensembldb_2.22.0 XML_3.99-0.14 zlibbioc_1.42.0 scales_1.2.1 BSgenome_1.64.0 hms_1.1.3
[49] ProtGenerics_1.30.0 parallel_4.2.3 AnnotationFilter_1.22.0 RColorBrewer_1.1-3 yaml_2.3.7 curl_5.0.0
[55] memoise_2.0.1 gridExtra_2.3 ggplot2_3.4.2 biomaRt_2.52.0 rpart_4.1.19 latticeExtra_0.6-30
[61] stringi_1.7.12 RSQLite_2.3.1 BiocIO_1.6.0 checkmate_2.2.0 GenomicFeatures_1.48.4 filelock_1.0.2
[67] BiocParallel_1.30.4 rlang_1.1.1 pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.21 lattice_0.21-8
[73] GenomicAlignments_1.32.1 htmlwidgets_1.6.2 bit_4.0.5 tidyselect_1.2.0 R6_2.5.1 generics_0.1.3
[79] Hmisc_5.1-0 DelayedArray_0.22.0 DBI_1.1.3 pillar_1.9.0 foreign_0.8-84 KEGGREST_1.36.3
[85] RCurl_1.98-1.12 nnet_7.3-19 tibble_3.2.1 crayon_1.5.2 interp_1.1-4 utf8_1.2.3
[91] BiocFileCache_2.4.0 rmarkdown_2.21 jpeg_0.1-10 progress_1.2.2 grid_4.2.3 data.table_1.14.8
[97] blob_1.2.4 digest_0.6.31 munsell_0.5.0 Gviz_1.42.1
I realize I had an issue similar to this last year. One workaround that was suggested was to use crossmap. https://github.com/marbl/CHM13/issues/61 This will work for bed files, but not for SVs in VCF format. I can't think of an easy way to work around this other than to fix the chain file. In any case, hopefully this will help someone else until the chain file can be fixed.
I fixed one of them. https://github.com/lawremi/rtracklayer/issues/23 Essentially, you can fix the chm13v2 to hg38 with sed. rtracklayer doesn't work with spaces instead of tabs.
However, the more important direction (migrating to CHM13) does not work.
sed -r 's/^([0-9]+) ([0-9]+) ([0-9]+)$/\1\t\2\t\3/' chm13v2-grch38.chain > chm13v2-grch38-tabs.chain
sed -r 's/^([0-9]+) ([0-9]+) ([0-9]+)$/\1\t\2\t\3/' grch38-chm13v2.chain > grch38-chm13v2-tabs.chain
chm13_to_hg38_chain<-rtracklayer::import.chain("chm13v2-grch38-tabs.chain")
> hg38_to_chm13_chain<-rtracklayer::import.chain("grch38-chm13v2-tabs.chain")
Error in .local(con, format, text, ...) :
expected 11 elements in header, got 1, on line 1
This is caused by the chains not having chainIds. This was fixed, but apparently never updated on S3. I put the fixed chains here:
Seems to work!
> hg38_to_chm13_chain<-rtracklayer::import.chain("hg38-chm13v2.over.chain")
> chm13_to_hg38_chain<-rtracklayer::import.chain("chm13v2-hg38.over.chain")
no error messages.
Hi, @diekhans,
Thanks for updated liftover chain, but There are some issues here.
In fact, I failed to fix the problems in grch38-chm13v2.chain
at that time, so as a bypass solution, I used grch38-chm13v2.chain
and UCSC liftover
to complete liftover and generate a large amount of data. It didn't report an error, but I was wondering if there was an underlying problem and if I needed to rerun my related process under fix's grch38-chm13v2.chain.
From my testing with limited data, there doesn't seem to be a significant difference in output, but I'm not sure if there is a potential issue, especially if I plan to use the modified file later in the process without updating the previous data
The problem with the original chains was that the chain id column was set to zero on all the chains. This causes liftOver to generate an error. However, I don't believe that the chain ids are used by leftover. We just ran a program to add chain ids. I don't know what your bypass solution was, so I can't comment if it would make a difference.
Because the old chain file can be accpect by UCSC liftover
, so I just write a ucsc liftover cmd warpper with R, its worked by convert my Granges to bed file, and use ucsc liftover to liftover the bed, and read the converted (liftover) bed as results, so if the old and new chain not effect ucsc liftover, its ok for me.
I have run a quick check:
# old
../liftOver hg38_bed ../grch38-chm13v2.chain test/hg38_high_to_chm13.old.bed test/hg38_to_chm13.old.unmapped -bedPlus=3
# new
../liftOver hg38_bed hg38-chm13v2.over.chain test/hg38_to_chm13.new.bed test/hg38_high_to_chm13.new.unmapped -bedPlus=3
md5sum hg38_to_chm13.old.bed
d500b2c86eac071a49edb865ae7c02e6 hg38_to_chm13.old.bed
md5sum hg38_to_chm13.new.bed
d500b2c86eac071a49edb865ae7c02e6 hg38_to_chm13.new.bed
Just wanted to add, in case it is helpful for anyone, the following snippet that worked for me. Replaces tabs with spaces in the chain header and creates unique chain ids:
awk '/^chain/ {$1=$1; OFS=" "; $NF = "chain" sprintf("%03d", ++count)} 1' original.chain > fixed.chain
Hi, I am noted that CHM13 liftover chain is incompatible with the import.chain function in R package rtracklayer
I have checked import.chain is work well with UCSC liftover chain, eg
hg38ToHg19.over.chain.gz
(http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz)Regrads
Zhang