suchestoncampbelllab / gwasurvivr

GWAS Survival Package in R
11 stars 12 forks source link

sangerCoxSurv fails in tutorial #9

Closed hhg7 closed 4 years ago

hhg7 commented 4 years ago

I'm trying to go through the tutorial for gwasurvivr in the vignette: https://bioconductor.org/packages/devel/bioc/vignettes/gwasurvivr/inst/doc/gwasurvivr_Introduction.html

but as I'm going through, I'm executing the code that I see in the R session examples:


library(gwasurvivr)
print("finished loading gwasurvivr")
vcf.file <- system.file(package="gwasurvivr","extdata", "michigan.chr14.dose.vcf.gz")
pheno.fl <- system.file(package="gwasurvivr", "extdata", "simulated_pheno.txt")
pheno.file <- read.table(pheno.fl, sep=" ", header=TRUE, stringsAsFactors = FALSE)
pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L)
sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2

print("I am running michiganCoxSurv...");
michiganCoxSurv(vcf.file=vcf.file,
                covariate.file=pheno.file,
                id.column="ID_2",
                sample.ids=sample.ids,
                time.to.event="time",
                event="event",
                covariates=c("age", "SexFemale", "DrugTxYes"),
                inter.term=NULL,
                print.covs="only",
                out.file="michigan_only",
                r2.filter=0.3,
                maf.filter=0.005,
                chunk.size=100,
                verbose=TRUE,
                clusterObj=NULL)
print("I finished michiganCoxSurv");
# recode sex column and remove first column 
pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L)
# select only experimental group sample.ids
sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2
head(sample.ids)
print("running sangerCoxSurv...");
sangerCoxSurv(vcf.file=vcf.file,
              covariate.file=pheno.file,
              id.column="ID_2",
              sample.ids=sample.ids,
              time.to.event="time",
              event="event",
              covariates=c("age", "SexFemale", "DrugTxYes"),
              inter.term=NULL,
              print.covs="only",
              out.file="sanger_only",
              info.filter=0.3,
              maf.filter=0.005,
              chunk.size=100,
              verbose=TRUE,
              clusterObj=NULL)
print("I finished sangerCoxSurv");

However, this fails and gives a strange error with sangerCoxSurv:

[1] "running sangerCoxSurv..."
Analysis started on 2020-04-14 at 17:07:36
Covariates included in the models are: age, DrugTxYes, SexFemale
52 samples are included in the analysis
Analyzing chunk 0-100
Error in `$<-.data.frame`(`*tmp*`, "RefPanelAF", value = list()) : 
  replacement has 0 rows, data has 3
Calls: sangerCoxSurv -> coxVcfSanger -> $<- -> $<-.data.frame
In addition: Warning message:
In .vcf_usertag(map, tag, nm, verbose) :
  ScanVcfParam ‘info’ fields not found in  header: ‘RefPanelAF’ ‘TYPED’ ‘INFO’
Execution halted```

why am I getting this error? did I do something wrong?

session information:```

> library(gwasurvivr)
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gwasurvivr_1.4.0

loaded via a namespace (and not attached):
 [1] Biobase_2.46.0              httr_1.4.1                 
 [3] tidyr_1.0.2                 bit64_0.9-7                
 [5] splines_3.6.3               assertthat_0.2.1           
 [7] askpass_1.1                 BiocFileCache_1.10.2       
 [9] stats4_3.6.3                GWASTools_1.32.0           
[11] blob_1.2.1                  BSgenome_1.54.0            
[13] GenomeInfoDbData_1.2.2      GWASExactHW_1.01           
[15] Rsamtools_2.2.3             progress_1.2.2             
[17] pillar_1.4.3                RSQLite_2.2.0              
[19] backports_1.1.5             lattice_0.20-41            
[21] quantreg_5.54               glue_1.3.2                 
[23] digest_0.6.25               GenomicRanges_1.38.0       
[25] XVector_0.26.0              sandwich_2.5-1             
[27] Matrix_1.2-18               XML_3.99-0.3               
[29] pkgconfig_2.0.3             broom_0.5.5                
[31] biomaRt_2.42.0              SparseM_1.78               
[33] zlibbioc_1.32.0             purrr_0.3.3                
[35] BiocParallel_1.20.1         MatrixModels_0.4-1         
[37] tibble_2.1.3                openssl_1.4.1              
[39] mgcv_1.8-31                 generics_0.0.2             
[41] IRanges_2.20.2              SummarizedExperiment_1.16.1
[43] GenomicFeatures_1.38.2      BiocGenerics_0.32.0        
[45] SNPRelate_1.20.1            survival_3.1-11            
[47] magrittr_1.5                crayon_1.3.4               
[49] memoise_1.1.0               mice_3.8.0                 
[51] nlme_3.1-144                tools_3.6.3                
[53] prettyunits_1.1.1           hms_0.5.3                  
[55] lifecycle_0.2.0             matrixStats_0.56.0         
[57] stringr_1.4.0               S4Vectors_0.24.3           
[59] DelayedArray_0.12.2         gdsfmt_1.22.0              
[61] AnnotationDbi_1.48.0        Biostrings_2.54.0          
[63] compiler_3.6.3              GenomeInfoDb_1.22.0        
[65] logistf_1.23                rlang_0.4.5                
[67] grid_3.6.3                  RCurl_1.98-1.1             
[69] rappdirs_0.3.1              VariantAnnotation_1.32.0   
[71] bitops_1.0-6                DNAcopy_1.60.0             
[73] curl_4.3                    DBI_1.1.0                  
[75] R6_2.4.1                    GenomicAlignments_1.22.1   
[77] zoo_1.8-7                   rtracklayer_1.46.0         
[79] dplyr_0.8.5                 bit_1.1-15.2               
[81] stringi_1.4.6               parallel_3.6.3             
[83] Rcpp_1.0.4                  quantsmooth_1.52.0         
[85] vctrs_0.2.4                 dbplyr_1.4.2               
[87] tidyselect_1.0.0            lmtest_0.9-37
aarizvi commented 4 years ago

Is it possible that the vcf file you are using isn't for Sanger? We have an example in the vignette for that as well.

vcf.file <- system.file(package="gwasurvivr",
                        "extdata", 
                        "sanger.pbwt_reference_impute.vcf.gz")

And then try sangerCoxSurv

hhg7 commented 4 years ago

Is it possible that the vcf file you are using isn't for Sanger? We have an example in the vignette for that as well.

vcf.file <- system.file(package="gwasurvivr",
                        "extdata", 
                        "sanger.pbwt_reference_impute.vcf.gz")

And then try sangerCoxSurv

My goof, I didn't realize that reading the VCF file was done again :/