zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

seqExport() fails for data with ploidy 1 #29

Closed jemunro closed 6 years ago

jemunro commented 6 years ago

Error message:

Error in index.gdsn(gdsfile, "phase/data") : 
  No such GDS node "phase/data"!

I have tested this on data with ploidy 2 and it works fine.

Session info:

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.4 (Final)

Matrix products: default
BLAS: /software/R/R-3.4.3/lib64/R/lib/libRblas.so
LAPACK: /software/R/R-3.4.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2             inline_0.3.15              viridis_0.5.1              viridisLite_0.3.0         
 [5] cowplot_0.9.2              ggplot2_2.2.1              magrittr_1.5               tibble_1.4.2              
 [9] dplyr_0.7.5                readr_1.1.1                stringr_1.3.1              hashmap_0.2.2             
[13] missMDA_1.12               moimix_0.0.1.9001          flexmix_2.3-14             lattice_0.20-35           
[17] SeqArray_1.18.2            gdsfmt_1.14.1              VariantAnnotation_1.24.5   Rsamtools_1.30.0          
[21] Biostrings_2.46.0          XVector_0.18.0             SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[25] matrixStats_0.53.1         Biobase_2.38.0             GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
[29] IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] nlme_3.1-137             mcmc_0.9-5               bitops_1.0-6             bit64_0.9-7             
 [5] RColorBrewer_1.1-2       progress_1.1.2           httr_1.3.1               tools_3.4.3             
 [9] utf8_1.1.3               R6_2.2.2                 rpart_4.1-13             lazyeval_0.2.1          
[13] DBI_0.8                  mgcv_1.8-23              colorspace_1.3-2         nnet_7.3-12             
[17] gridExtra_2.3            tidyselect_0.2.4         prettyunits_1.0.2        RMySQL_0.10.14          
[21] bit_1.1-12               compiler_3.4.3           cli_1.0.0                quantreg_5.35           
[25] flashClust_1.01-2        mice_2.46.0              SparseM_1.77             rtracklayer_1.38.3      
[29] scales_0.5.0             mvtnorm_1.0-7            digest_0.6.15            GWASExactHW_1.01        
[33] MCMCpack_1.4-3           pkgconfig_2.0.1          FactoMineR_1.41          BSgenome_1.46.0         
[37] rlang_0.2.0              rstudioapi_0.7           RSQLite_2.1.0            bindr_0.1.1             
[41] BiocParallel_1.12.0      RCurl_1.95-4.10          modeltools_0.2-21        GenomeInfoDbData_1.0.0  
[45] leaps_3.0                Matrix_1.2-14            Rcpp_0.12.17             munsell_0.4.3           
[49] scatterplot3d_0.3-41     stringi_1.1.7            yaml_2.1.18              MASS_7.3-49             
[53] zlibbioc_1.24.0          Rtsne_0.13               plyr_1.8.4               grid_3.4.3              
[57] blob_1.1.1               crayon_1.3.4             splines_3.4.3            GenomicFeatures_1.30.3  
[61] hms_0.4.2                knitr_1.20               pillar_1.2.1             logistf_1.22            
[65] codetools_0.2-15         biomaRt_2.34.2           XML_3.98-1.11            glue_1.2.0              
[69] MatrixModels_0.4-1       gtable_0.2.0             purrr_0.2.4              tidyr_0.8.0             
[73] assertthat_0.2.0         SeqVarTools_1.16.1       coda_0.19-1              survival_2.41-3         
[77] GenomicAlignments_1.14.2 AnnotationDbi_1.40.0     memoise_1.1.0            cluster_2.0.7-1 
zhengxwen commented 6 years ago

Could you please show me the file structure of gdsfile?

print(gdsfile)
jemunro commented 6 years ago
Object of class "SeqVarGDSClass"
File: /data/p1.gds (301.9M)
+    [  ] *
|--+ description   [  ] *
|--+ sample.id   { Str8 325 ZIP_ra(29.7%), 1.0K } *
|--+ variant.id   { Int32 732229 ZIP_ra(34.6%), 989.1K } *
|--+ position   { Int32 732229 ZIP_ra(39.2%), 1.1M } *
|--+ chromosome   { Str8 732229 ZIP_ra(0.20%), 16.7K } *
|--+ allele   { Str8 732229 ZIP_ra(20.5%), 1.9M } *
|--+ genotype   [  ] *
|  |--+ data   { Bit2 1x325x792285 ZIP_ra(20.2%), 12.4M } *
|  |--+ extra.index   { Int32 3x0 ZIP_ra, 16B } *
|  \--+ extra   { Int16 0 ZIP_ra, 16B }
|--+ phase   [  ]
|--+ annotation   [  ]
|  |--+ id   { Str8 732229 ZIP_ra(0.10%), 749B } *
|  |--+ qual   { Float32 732229 ZIP_ra(85.6%), 2.4M } *
|  |--+ filter   { Int32,factor 732229 ZIP_ra(0.10%), 2.8K } *
|  |--+ info   [  ]
|  |  |--+ AC   { Int32 998798 ZIP_ra(21.3%), 830.2K } *
|  |  |--+ AF   { Float32 998798 ZIP_ra(33.0%), 1.3M } *
|  |  |--+ AN   { Int32 732229 ZIP_ra(14.5%), 416.1K } *
|  |  |--+ BaseQRankSum   { Float32 732229 ZIP_ra(44.2%), 1.2M } *
|  |  |--+ ClippingRankSum   { Float32 732229 ZIP_ra(5.73%), 163.8K } *
|  |  |--+ DP   { Int32 732229 ZIP_ra(55.4%), 1.5M } *
|  |  |--+ DS   { Bit1 732229 ZIP_ra(0.13%), 128B } *
|  |  |--+ END   { Int32 732229 ZIP_ra(0.10%), 2.8K } *
|  |  |--+ ExcessHet   { Float32 732229 ZIP_ra(0.10%), 2.8K } *
|  |  |--+ FS   { Float32 732229 ZIP_ra(35.1%), 1002.9K } *
|  |  |--+ InbreedingCoeff   { Float32 732229 ZIP_ra(0.10%), 2.8K } *
|  |  |--+ MLEAC   { Int32 998798 ZIP_ra(21.6%), 843.7K } *
|  |  |--+ MLEAF   { Float32 998798 ZIP_ra(32.9%), 1.3M } *
|  |  |--+ MQ   { Float32 732229 ZIP_ra(29.4%), 840.8K } *
|  |  |--+ MQRankSum   { Float32 732229 ZIP_ra(15.9%), 454.2K } *
|  |  |--+ QD   { Float32 732229 ZIP_ra(51.7%), 1.4M } *
|  |  |--+ RAW_MQ   { Float32 732229 ZIP_ra(0.10%), 2.8K } *
|  |  |--+ ReadPosRankSum   { Float32 732229 ZIP_ra(45.1%), 1.3M } *
|  |  \--+ SOR   { Float32 732229 ZIP_ra(49.0%), 1.4M } *
|  \--+ format   [  ]
|     |--+ AD   [  ] *
|     |  \--+ data   { VL_Int 325x1731027 ZIP_ra(13.2%), 79.0M } *
|     |--+ DP   [  ] *
|     |  \--+ data   { VL_Int 325x732229 ZIP_ra(19.6%), 56.4M } *
|     |--+ GQ   [  ] *
|     |  \--+ data   { VL_Int 325x732229 ZIP_ra(3.48%), 16.1M } *
|     |--+ MIN_DP   [  ] *
|     |  \--+ data   { VL_Int 325x0 ZIP_ra, 16B } *
|     |--+ PL   [  ] *
|     |  \--+ data   { VL_Int 325x1731027 ZIP_ra(14.0%), 116.8M } *
|     |--+ RGQ   [  ] *
|     |  \--+ data   { VL_Int 325x0 ZIP_ra, 16B } *
|     \--+ SB   [  ] *
|        \--+ data   { VL_Int 325x0 ZIP_ra, 16B } *
\--+ sample.annotation   [  ]
jemunro commented 6 years ago

As a work-around, I have been using a modified version of seqExport() with the following block of code deleted:

  sync.gds(outfile)
  node <- addfolder.gdsn(outfile, "phase")
  put.attr.gdsn(node, val = index.gdsn(gdsfile, "phase"))
  cp.phase(node, S$sample.sel, S$variant.sel)
  if (prod(objdesp.gdsn(index.gdsn(gdsfile, "phase/extra.index"))$dim) <= 
    0) {
    copyto.gdsn(node, index.gdsn(gdsfile, "phase/extra.index"))
    copyto.gdsn(node, index.gdsn(gdsfile, "phase/extra"))
  }
  else stop("Not implemented in 'phase/extra.index', please contact the author.")
zhengxwen commented 6 years ago

Thanks! I will fix it.